Thread: hanging for 30sec when checkpointing

hanging for 30sec when checkpointing

From

Shane Wright

Date:

03 February 2004, 18:36:15

Hi,

I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
getting some weird performance at times.

When the db is under medium-heavy load, it periodically spawns a
'checkpoint subprocess' which runs for between 15 seconds and a minute.
Ok, fair enough, the only problem is the whole box becomes pretty much
unresponsive during this time - from what I can gather it's because it
writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.

Other processes can continue to run (e.g. vmstat) but other things do
not (other queries, mostly running 'ps fax', etc).  So everything gets
stacked up till the checkpoint finishes and all is well again, untill
the next time...

This only really happens under medium/high load, but doesn't seem
related to the length/complexity of transactions done.

The box isn't doing a lot else at the same time - most queries some in
from separate web server boxes.

The disks, although IDE, can definately handle more than 1Mb/sec - even
with multiple concurrent writes.  The box is powerful (2.6Ghz Xeon, 2Gb
RAM).  Its a clean compile from source of 7.3.4, although I can't really
upgrade to 7.4.x at this time as I can't afford the 18 hours downtime to
dump/restore the database.  Fsync is on.  Most other settings at their
defaults.

I've looked at the documentation and various bits about adjusting
checkpoint segments and timings - but it seems reducing segments/timeout
is implied to be bad, but it seems to me that increasing either will
just make the same thing happen less often but more severely.

If it makes any odds, this seems to happen much more often when doing
bulk UPDATEs and INSERTs - athough these are in transactions grouping
them together - and they don't affect the same tables as other queries
that still get stalled (no lock contention causing the problem).

What am I missing?  I'm sure I'm missing something blatantly obvious,
but as it's only really happening on production systems (only place with
the load and the volume of data) I'm loathe to experiment.

Any help appreciated,

Cheers,

Shane

Re: hanging for 30sec when checkpointing

From

Tom Lane

Date:

03 February 2004, 19:44:22

Shane Wright <me@shanewright.co.uk> writes:
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.

It sounds like checkpoint is saturating your disk bandwidth.

> I've looked at the documentation and various bits about adjusting
> checkpoint segments and timings - but it seems reducing segments/timeout
> is implied to be bad, but it seems to me that increasing either will
> just make the same thing happen less often but more severely.

I was actually just about to suggest reducing the inter-checkpoint
interval.  That should reduce the volume of data that needs to be
written per checkpoint.  Might spread the pain a little thinner at least.

If you think you should have more disk bandwidth available than the
system seems to be using, maybe there is something wrong with the disk
or kernel configuration, but that's way out of my league to diagnose.
You'd probably need to find some kernel hackers to help you with that.

BTW, Jan Wieck has been doing some work recently to try to reduce the
"I/O storm at checkpoint" effect, but it won't appear till 7.5.

            regards, tom lane

Re: hanging for 30sec when checkpointing

From

"Iain"

Date:

03 February 2004, 21:40:48

If I understand checkpoints correctly, data that is already written to the
WAL (and flushed to disk) is being written to the DB (flushing to disk).
Meanwhile, other writer transactions are continuing to busily write to the
WAL. In which case a disk bandwidth problem (other than kernal config
issues) may be helped by placing the WAL files on a disk (and maybe even
controller) seperate from the DB.

Something to think about anyway.

regards
Iain
----- Original Message -----
From: "Shane Wright" <me@shanewright.co.uk>
To: <pgsql-admin@postgresql.org>
Sent: Wednesday, February 04, 2004 7:35 AM
Subject: [ADMIN] hanging for 30sec when checkpointing


> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> Ok, fair enough, the only problem is the whole box becomes pretty much
> unresponsive during this time - from what I can gather it's because it
> writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its
done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc).  So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...
>
> This only really happens under medium/high load, but doesn't seem
> related to the length/complexity of transactions done.
>
> The box isn't doing a lot else at the same time - most queries some in
> from separate web server boxes.
>
> The disks, although IDE, can definately handle more than 1Mb/sec - even
> with multiple concurrent writes.  The box is powerful (2.6Ghz Xeon, 2Gb
> RAM).  Its a clean compile from source of 7.3.4, although I can't really
> upgrade to 7.4.x at this time as I can't afford the 18 hours downtime to
> dump/restore the database.  Fsync is on.  Most other settings at their
> defaults.
>
> I've looked at the documentation and various bits about adjusting
> checkpoint segments and timings - but it seems reducing segments/timeout
> is implied to be bad, but it seems to me that increasing either will
> just make the same thing happen less often but more severely.
>
> If it makes any odds, this seems to happen much more often when doing
> bulk UPDATEs and INSERTs - athough these are in transactions grouping
> them together - and they don't affect the same tables as other queries
> that still get stalled (no lock contention causing the problem).
>
> What am I missing?  I'm sure I'm missing something blatantly obvious,
> but as it's only really happening on production systems (only place with
> the load and the volume of data) I'm loathe to experiment.
>
> Any help appreciated,
>
> Cheers,
>
> Shane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

04 February 2004, 13:16:11

On Wed, 4 Feb 2004, Iain wrote:

> If I understand checkpoints correctly, data that is already written to the
> WAL (and flushed to disk) is being written to the DB (flushing to disk).
> Meanwhile, other writer transactions are continuing to busily write to the
> WAL. In which case a disk bandwidth problem (other than kernal config
> issues) may be helped by placing the WAL files on a disk (and maybe even
> controller) seperate from the DB.

Also, running on SCSI drives will be much faster than running on IDE
drives if the IDE drives have their caches disabled like they should,
since they lie otherwise.  Since SCSI disks don't usually lie, and are
designed to handle multiple requests in parallel, they are much faster as
parallel load increases.  If you're writing a lot, you should either have
a great number of IDE drives with the write cache turned off, like some of
the newer storage devices made of ~100 IDE drives, or you should have
SCSI.  SCSI's advantage won't be as great as the number of drives
approaches infinity.  But for 1 to 10 drives my guess is that SCSI is
gonna be a clear winner under parallel load.

Re: hanging for 30sec when checkpointing

From

Sam Barnett-Cormack

Date:

04 February 2004, 13:57:22

I've seen similar behaviour with other disk-intensive apps, and in every
case it transpired that DMA was not enabled on the relevant disk/s -
something to check, certainly.

On Tue, 3 Feb 2004, Shane Wright wrote:

> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> Ok, fair enough, the only problem is the whole box becomes pretty much
> unresponsive during this time - from what I can gather it's because it
> writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc).  So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...
>
> This only really happens under medium/high load, but doesn't seem
> related to the length/complexity of transactions done.
>
> The box isn't doing a lot else at the same time - most queries some in
> from separate web server boxes.
>
> The disks, although IDE, can definately handle more than 1Mb/sec - even
> with multiple concurrent writes.  The box is powerful (2.6Ghz Xeon, 2Gb
> RAM).  Its a clean compile from source of 7.3.4, although I can't really
> upgrade to 7.4.x at this time as I can't afford the 18 hours downtime to
> dump/restore the database.  Fsync is on.  Most other settings at their
> defaults.
>
> I've looked at the documentation and various bits about adjusting
> checkpoint segments and timings - but it seems reducing segments/timeout
> is implied to be bad, but it seems to me that increasing either will
> just make the same thing happen less often but more severely.
>
> If it makes any odds, this seems to happen much more often when doing
> bulk UPDATEs and INSERTs - athough these are in transactions grouping
> them together - and they don't affect the same tables as other queries
> that still get stalled (no lock contention causing the problem).
>
> What am I missing?  I'm sure I'm missing something blatantly obvious,
> but as it's only really happening on production systems (only place with
> the load and the volume of data) I'm loathe to experiment.
>
> Any help appreciated,
>
> Cheers,
>
> Shane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University

Re: hanging for 30sec when checkpointing

From

Greg Spiegelberg

Date:

04 February 2004, 14:09:55

scott.marlowe wrote:
> On Wed, 4 Feb 2004, Iain wrote:
>
>>If I understand checkpoints correctly, data that is already written to the
>>WAL (and flushed to disk) is being written to the DB (flushing to disk).
>>Meanwhile, other writer transactions are continuing to busily write to the
>>WAL. In which case a disk bandwidth problem (other than kernal config
>>issues) may be helped by placing the WAL files on a disk (and maybe even
>>controller) seperate from the DB.
>
>
> Also, running on SCSI drives will be much faster than running on IDE
> drives if the IDE drives have their caches disabled like they should,
> since they lie otherwise.  Since SCSI disks don't usually lie, and are
> designed to handle multiple requests in parallel, they are much faster as
> parallel load increases.  If you're writing a lot, you should either have
> a great number of IDE drives with the write cache turned off, like some of
> the newer storage devices made of ~100 IDE drives, or you should have
> SCSI.  SCSI's advantage won't be as great as the number of drives
> approaches infinity.  But for 1 to 10 drives my guess is that SCSI is
> gonna be a clear winner under parallel load.

Don't forget the file system.  Most journaling file systems are great
for reliability but aren't always so hot come performance and those
that are may require tweaking.  Linux EXT3, IMO, works best when the
journal is kept on a different device and the file system is mounted
with the data=writeback option.  Those 2 things in our test environment
were worth a speedup of close to 14%.


--
Greg Spiegelberg
  Sr. Product Development Engineer
  Cranel, Incorporated.
  Phone: 614.318.4314
  Fax:   614.431.8388
  Email: gspiegelberg@Cranel.com
Cranel. Technology. Integrity. Focus.

Re: hanging for 30sec when checkpointing

From

Shane Wright

Date:

06 February 2004, 05:39:38

Hi

Thanks to you all for your help!  I'm continually impressed with how
responsive and knowledgeable y'all are!

To clarify; it's an IDE drive with a reiserfs filesystem.  DMA is
definately enabled, sequential reads pull 35Mb/sec sustained.

The I/O caused by the checkpoint just seems to be too much while other
transactions are running.  As it's a managed server at our ISP
throwing more hardware at it isn't an option at the moment
unfortunately, so I think I'm left with optimising the app to reduce
the number of INSERTs/UPDATEs.

Is what Iain said correct about [committed] transactions only being
written to WAL, and actual table data files are only updated at
checkpoint?

I guess really it's something I hadn't thought of - in effect, the
database is able to handle _bursts_ of high load, but sustaining it is
hard (because checkpoint happens sooner or later).

Hmm that gives me an idea, for bulk processing, is there a way of
detecting from a client when a checkpoint is about to happen so it can
wait until it's finished?  Some way that's easier than -z `ps fax |
grep post | grep checkpoint` that is ;)

Cheers

Shane

On 3 Feb 2004, at 22:35, Shane Wright wrote:

<excerpt>Hi,

I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
getting some weird performance at times.

When the db is under medium-heavy load, it periodically spawns a
'checkpoint subprocess' which runs for between 15 seconds and a
minute.  Ok, fair enough, the only problem is the whole box becomes
pretty much unresponsive during this time - from what I can gather
it's because it writes out roughly 1Mb (vmstat says ~1034 blocks) per
second until its done.

Other processes can continue to run (e.g. vmstat) but other things do
not (other queries, mostly running 'ps fax', etc).  So everything gets
stacked up till the checkpoint finishes and all is well again, untill
the next time...

This only really happens under medium/high load, but doesn't seem
related to the length/complexity of transactions done.

The box isn't doing a lot else at the same time - most queries some in
from separate web server boxes.

The disks, although IDE, can definately handle more than 1Mb/sec -
even with multiple concurrent writes.  The box is powerful (2.6Ghz
Xeon, 2Gb RAM).  Its a clean compile from source of 7.3.4, although I
can't really upgrade to 7.4.x at this time as I can't afford the 18
hours downtime to dump/restore the database.  Fsync is on.  Most other
settings at their defaults.

I've looked at the documentation and various bits about adjusting
checkpoint segments and timings - but it seems reducing
segments/timeout is implied to be bad, but it seems to me that
increasing either will just make the same thing happen less often but
more severely.

If it makes any odds, this seems to happen much more often when doing
bulk UPDATEs and INSERTs - athough these are in transactions grouping
them together - and they don't affect the same tables as other queries
that still get stalled (no lock contention causing the problem).

What am I missing?  I'm sure I'm missing something blatantly obvious,
but as it's only really happening on production systems (only place
with the load and the volume of data) I'm loathe to experiment.

Any help appreciated,

Cheers,

Shane

---------------------------(end of
broadcast)---------------------------

TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

</excerpt><fontfamily><param>Arial</param><smaller>Shane Wright

Technical Manager

eDigitalResearch.com

2 Berrywood Business Village

Hedge End

Hampshire

SO30 2UN

T +44 (0) 1489 772920

F +44 (0) 1489 772922

This message is sent in confidence for the addressee only.  The
contents are not to be disclosed to anyone other than the addressee. 
Unauthorised recipients must preserve this confidentiality and should
please advise the sender immediately of any error in transmission.

Any attachment(s) to this message has been checked for viruses, but
please rely on your own virus checker and procedures.</smaller></fontfamily>

Hi

Thanks to you all for your help!  I'm continually impressed with how
responsive and knowledgeable y'all are!

To clarify; it's an IDE drive with a reiserfs filesystem.  DMA is
definately enabled, sequential reads pull 35Mb/sec sustained.

The I/O caused by the checkpoint just seems to be too much while other
transactions are running.  As it's a managed server at our ISP throwing
more hardware at it isn't an option at the moment unfortunately, so I
think I'm left with optimising the app to reduce the number of
INSERTs/UPDATEs.

Is what Iain said correct about [committed] transactions only being
written to WAL, and actual table data files are only updated at
checkpoint?

I guess really it's something I hadn't thought of - in effect, the
database is able to handle _bursts_ of high load, but sustaining it is
hard (because checkpoint happens sooner or later).

Hmm that gives me an idea, for bulk processing, is there a way of
detecting from a client when a checkpoint is about to happen so it can
wait until it's finished?  Some way that's easier than -z `ps fax |
grep post | grep checkpoint` that is ;)

Cheers

Shane

On 3 Feb 2004, at 22:35, Shane Wright wrote:

> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a
> minute.  Ok, fair enough, the only problem is the whole box becomes
> pretty much unresponsive during this time - from what I can gather
> it's because it writes out roughly 1Mb (vmstat says ~1034 blocks) per
> second until its done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc).  So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...
>
> This only really happens under medium/high load, but doesn't seem
> related to the length/complexity of transactions done.
>
> The box isn't doing a lot else at the same time - most queries some in
> from separate web server boxes.
>
> The disks, although IDE, can definately handle more than 1Mb/sec -
> even with multiple concurrent writes.  The box is powerful (2.6Ghz
> Xeon, 2Gb RAM).  Its a clean compile from source of 7.3.4, although I
> can't really upgrade to 7.4.x at this time as I can't afford the 18
> hours downtime to dump/restore the database.  Fsync is on.  Most other
> settings at their defaults.
>
> I've looked at the documentation and various bits about adjusting
> checkpoint segments and timings - but it seems reducing
> segments/timeout is implied to be bad, but it seems to me that
> increasing either will just make the same thing happen less often but
> more severely.
>
> If it makes any odds, this seems to happen much more often when doing
> bulk UPDATEs and INSERTs - athough these are in transactions grouping
> them together - and they don't affect the same tables as other queries
> that still get stalled (no lock contention causing the problem).
>
> What am I missing?  I'm sure I'm missing something blatantly obvious,
> but as it's only really happening on production systems (only place
> with the load and the volume of data) I'm loathe to experiment.
>
> Any help appreciated,
>
> Cheers,
>
> Shane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>
>
Shane Wright
Technical Manager
eDigitalResearch.com
2 Berrywood Business Village
Hedge End
Hampshire
SO30 2UN
T +44 (0) 1489 772920
F +44 (0) 1489 772922

This message is sent in confidence for the addressee only.  The
contents are not to be disclosed to anyone other than the addressee. 
Unauthorised recipients must preserve this confidentiality and should
please advise the sender immediately of any error in transmission.

Any attachment(s) to this message has been checked for viruses, but
please rely on your own virus checker and procedures.

Re: hanging for 30sec when checkpointing

From

Tom Lane

Date:

06 February 2004, 11:33:12

Shane Wright <me@shanewright.co.uk> writes:
> Hmm that gives me an idea, for bulk processing, is there a way of
> detecting from a client when a checkpoint is about to happen so it can
> wait until it's finished?

No, but maybe you could think about scheduling checkpoints yourself
to not coincide with your bulk jobs.  You could issue CHECKPOINT
commands from cron or whatever is dispatching your bulk jobs, and then
widen the checkpoint-timing parameters in postgresql.conf enough to
avoid automatic CHECKPOINTs.

The only real drawback of less-frequent CHECKPOINTs is that the amount
of WAL space required goes up proportionally.  (Oh, and it might take
proportionally longer to recover after a crash, too.)

            regards, tom lane

Re: hanging for 30sec when checkpointing

From

Shane Wright

Date:

06 February 2004, 13:16:30

Tom,

Damn, why didn't I think of that myself...

Although, is there any performance implication of not doing
checkpoints very often?  (aside from, I assume, that each checkpoint
will take longer and hence saturate available I/O for longer)

Cheers

Shane

On 6 Feb 2004, at 15:33, Tom Lane wrote:

<excerpt>Shane Wright <<me@shanewright.co.uk> writes:

<excerpt>Hmm that gives me an idea, for bulk processing, is there a
way of

detecting from a client when a checkpoint is about to happen so it can

wait until it's finished?

</excerpt>

No, but maybe you could think about scheduling checkpoints yourself

to not coincide with your bulk jobs.  You could issue CHECKPOINT

commands from cron or whatever is dispatching your bulk jobs, and then

widen the checkpoint-timing parameters in postgresql.conf enough to

avoid automatic CHECKPOINTs.

The only real drawback of less-frequent CHECKPOINTs is that the amount

of WAL space required goes up proportionally.  (Oh, and it might take

proportionally longer to recover after a crash, too.)

            regards, tom lane

</excerpt><fontfamily><param>Arial</param><smaller>Shane Wright

Technical Manager

eDigitalResearch.com

2 Berrywood Business Village

Hedge End

Hampshire

SO30 2UN

T +44 (0) 1489 772920

F +44 (0) 1489 772922

This message is sent in confidence for the addressee only.  The
contents are not to be disclosed to anyone other than the addressee. 
Unauthorised recipients must preserve this confidentiality and should
please advise the sender immediately of any error in transmission.

Any attachment(s) to this message has been checked for viruses, but
please rely on your own virus checker and procedures.</smaller></fontfamily>

Tom,

Damn, why didn't I think of that myself...

Although, is there any performance implication of not doing checkpoints
very often?  (aside from, I assume, that each checkpoint will take
longer and hence saturate available I/O for longer)

Cheers

Shane

On 6 Feb 2004, at 15:33, Tom Lane wrote:

> Shane Wright <me@shanewright.co.uk> writes:
>> Hmm that gives me an idea, for bulk processing, is there a way of
>> detecting from a client when a checkpoint is about to happen so it can
>> wait until it's finished?
>
> No, but maybe you could think about scheduling checkpoints yourself
> to not coincide with your bulk jobs.  You could issue CHECKPOINT
> commands from cron or whatever is dispatching your bulk jobs, and then
> widen the checkpoint-timing parameters in postgresql.conf enough to
> avoid automatic CHECKPOINTs.
>
> The only real drawback of less-frequent CHECKPOINTs is that the amount
> of WAL space required goes up proportionally.  (Oh, and it might take
> proportionally longer to recover after a crash, too.)
>
>             regards, tom lane
>
>
Shane Wright
Technical Manager
eDigitalResearch.com
2 Berrywood Business Village
Hedge End
Hampshire
SO30 2UN
T +44 (0) 1489 772920
F +44 (0) 1489 772922

This message is sent in confidence for the addressee only.  The
contents are not to be disclosed to anyone other than the addressee. 
Unauthorised recipients must preserve this confidentiality and should
please advise the sender immediately of any error in transmission.

Any attachment(s) to this message has been checked for viruses, but
please rely on your own virus checker and procedures.

Re: hanging for 30sec when checkpointing

From

Tom Lane

Date:

06 February 2004, 13:54:01

Shane Wright <me@shanewright.co.uk> writes:
> Although, is there any performance implication of not doing checkpoints
> very often?  (aside from, I assume, that each checkpoint will take
> longer and hence saturate available I/O for longer)

Mmm, right, I forgot to mention that one, which was a bit silly
considering that that was the focus of the discussion.  No, infrequent
checkpoints won't hurt performance between checkpoints.

            regards, tom lane

Re: hanging for 30sec when checkpointing

From

Andrew Sullivan

Date:

06 February 2004, 15:25:23

On Fri, Feb 06, 2004 at 12:34:47PM -0500, Tom Lane wrote:
> Mmm, right, I forgot to mention that one, which was a bit silly
> considering that that was the focus of the discussion.  No, infrequent
> checkpoints won't hurt performance between checkpoints.

In my experience, though, they do hurt _quite substantially_ during
the checkpoint.  If you need fast all the time, cranking that
interval too high will definitely hurt you.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
        --Philip Greenspun

Re: hanging for 30sec when checkpointing

From

"Peter Galbavy"

Date:

07 February 2004, 06:57:10

scott.marlowe wrote:
> Also, running on SCSI drives will be much faster than running on IDE
> drives if the IDE drives have their caches disabled like they should,
> since they lie otherwise.  Since SCSI disks don't usually lie, and are
> designed to handle multiple requests in parallel, they are much
> faster as parallel load increases.  If you're writing a lot, you
> should either have a great number of IDE drives with the write cache
> turned off, like some of the newer storage devices made of ~100 IDE
> drives, or you should have SCSI.  SCSI's advantage won't be as great
> as the number of drives approaches infinity.  But for 1 to 10 drives
> my guess is that SCSI is gonna be a clear winner under parallel load.

Nice to see old fashioned misinformation being spread around the place...

Peter

Re: hanging for 30sec when checkpointing

From

Andrew Sullivan

Date:

07 February 2004, 13:04:01

On Sat, Feb 07, 2004 at 10:56:57AM -0000, Peter Galbavy wrote:
> Nice to see old fashioned misinformation being spread around the place...

Do you care to clarify that remark?  Because there's _plenty_ of
evidence that some IDE drives do not tell the truth about what
they're doing, and that SCSI has been, historically anyway, better at
dealing with parallel loads.  This is because of the historical
design goals of the two different systems.  Glib remarks about
misinformation are easy to make, but don't really help anyone make
decisions.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
        --Philip Greenspun

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

09 February 2004, 13:07:10

On Sat, 7 Feb 2004, Peter Galbavy wrote:

> scott.marlowe wrote:
> > Also, running on SCSI drives will be much faster than running on IDE
> > drives if the IDE drives have their caches disabled like they should,
> > since they lie otherwise.  Since SCSI disks don't usually lie, and are
> > designed to handle multiple requests in parallel, they are much
> > faster as parallel load increases.  If you're writing a lot, you
> > should either have a great number of IDE drives with the write cache
> > turned off, like some of the newer storage devices made of ~100 IDE
> > drives, or you should have SCSI.  SCSI's advantage won't be as great
> > as the number of drives approaches infinity.  But for 1 to 10 drives
> > my guess is that SCSI is gonna be a clear winner under parallel load.
>
> Nice to see old fashioned misinformation being spread around the place...

I don't know who you think you are, but I've physically tested the stuff
I'm talking about.  Care to qualify what you mean?

IDE drives (all the ones I've ever tested) LIE about their write caches
and fsync.  don't believe me?  Simple, hook one up, initiate 100 parallel
transactions, pull the power plug, watch your database fail to come back
up due to the corruption caused by the LYING IDE drives.

Do the same with SCSI.  watch the database come right back to life.

If you're gonna accuse me of lying, you damned well better have the balls
AND evidence to back it up.

Re: hanging for 30sec when checkpointing

From

matt@ymogen.net

Date:

09 February 2004, 17:28:55

>
> If you're gonna accuse me of lying, you damned well better have the balls
> AND evidence to back it up.
>

Wow.  Scott, all traffic to the admin list has ceased since you posted
this, we are shocked!

You put a lot of effort into finding the root of the 'mysteriously good
IDE performance' issue last year, and did so with great professionalism
and thoroughness, so it's understandable that this chap's comments would
upset you.

If he doesn't have the wit to respond then he's just a troll anyway.

M

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

09 February 2004, 17:44:22

On Mon, 9 Feb 2004 matt@ymogen.net wrote:

> >
> > If you're gonna accuse me of lying, you damned well better have the balls
> > AND evidence to back it up.
> >
>
> Wow.  Scott, all traffic to the admin list has ceased since you posted
> this, we are shocked!
>
> You put a lot of effort into finding the root of the 'mysteriously good
> IDE performance' issue last year, and did so with great professionalism
> and thoroughness, so it's understandable that this chap's comments would
> upset you.
>
> If he doesn't have the wit to respond then he's just a troll anyway.

Well, I still feel I should apologize for my harshness on a public list.
I could have at least taken that one private.  But you are right, I did
spend a lot of my time chasing down the IDE issues, so to be so flippantly
accused of spreading misinformation rankled me abit.

So, to anyone on the admin list that was offended, I apologize...

Re: hanging for 30sec when checkpointing

From

matt@ymogen.net

Date:

09 February 2004, 17:54:16

>> If he doesn't have the wit to respond then he's just a troll anyway.
>
> Well, I still feel I should apologize for my harshness on a public list.
> I could have at least taken that one private.  But you are right, I did
> spend a lot of my time chasing down the IDE issues, so to be so flippantly
> accused of spreading misinformation rankled me abit.
>
> So, to anyone on the admin list that was offended, I apologize...
>

Personally I was very amused.  This list is an oasis of calm for the most
part, so a little drama doesn't hurt.  What rankled me about the original
flippant dismissal was precisely that it's almost unheard of for BS to be
posted here, so the comment was actually rude not just to you, but to the
list as a whole.

M

Re: hanging for 30sec when checkpointing

From

"Goulet, Dick"

Date:

09 February 2004, 18:44:14

Scott,

    If you feel it is necessary to apologize for such a minor infraction of polite etiquette please come on over to
Oracle-L. We have harshness 10 times greater.  Probably because there are so many practioners and so many different
pointsof view.  We call them "Holy Wars".  The current blazing one is on RAID, the good, the bad, and the ugly.  

    BTW: From a Holy War on Oracle-L of similar topic.  There is a difference on how bad that lying IDE drive is
dependingon who the vendor is, what system it's plugged into, and what OS is being used.  Some do a better job than
othersof "covering up" the lies.  The other chap may have one of those better systems, so from his point of view it's
"oldfashioned misinformation".  Doesn't mean it's not true, just covered up better.  Kind of like "Air Freshener". 

Dick Goulet
Senior Oracle DBA
Oracle Certified 8i DBA

-----Original Message-----
From: scott.marlowe [mailto:scott.marlowe@ihs.com]
Sent: Monday, February 09, 2004 4:37 PM
To: matt@ymogen.net
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] hanging for 30sec when checkpointing


On Mon, 9 Feb 2004 matt@ymogen.net wrote:

> >
> > If you're gonna accuse me of lying, you damned well better have the balls
> > AND evidence to back it up.
> >
>
> Wow.  Scott, all traffic to the admin list has ceased since you posted
> this, we are shocked!
>
> You put a lot of effort into finding the root of the 'mysteriously good
> IDE performance' issue last year, and did so with great professionalism
> and thoroughness, so it's understandable that this chap's comments would
> upset you.
>
> If he doesn't have the wit to respond then he's just a troll anyway.

Well, I still feel I should apologize for my harshness on a public list.
I could have at least taken that one private.  But you are right, I did
spend a lot of my time chasing down the IDE issues, so to be so flippantly
accused of spreading misinformation rankled me abit.

So, to anyone on the admin list that was offended, I apologize...


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

09 February 2004, 19:56:12

On Mon, 9 Feb 2004, Goulet, Dick wrote:

> Scott,
>
>     If you feel it is necessary to apologize for such a minor
> infraction of polite etiquette please come on over to Oracle-L.  We have
> harshness 10 times greater.  Probably because there are so many
> practioners and so many different points of view.  We call them "Holy
> Wars".  The current blazing one is on RAID, the good, the bad, and the
> ugly.

Hehe, I grew up on FIDO net, so I know all about the flammage... :-)

I can still remember the amiga versus atari ST holy wars of old.

>     BTW: From a Holy War on Oracle-L of similar topic.  There is a
> difference on how bad that lying IDE drive is depending on who the
> vendor is, what system it's plugged into, and what OS is being used.
> Some do a better job than others of "covering up" the lies.  The other
> chap may have one of those better systems, so from his point of view
> it's "old fashioned misinformation".  Doesn't mean it's not true, just
> covered up better.  Kind of like "Air Freshener".

Well, it's interesting that during all the testing I and many others were
doing last year, it appeared the escalade IDE RAID controllers were doing
SOMETHING (no is quite sure if it was disabling write cache or not, but we
guessed that was so) that made the IDE drives under them safe from the
power off data loss issue that IDE drives seem to suffer from.

As for the OS, my guess is that some OSes probably just insert some delay
between receiving an fsync notification from an IDE drive and reporting it
back to the application or something like that, that makes them appear
safe.  Such situations often result in systems that only fail under very
heavy concurrent load, but pass the test under light to medium
concurrency.

That said we have a really HUGE (~200 drive) IDE storage array my web /
app server sits on top of.  No clue if that thing will reliably work under
a database, and I'm in no hurry to find out.

But since the fsync on WAL is all that seems important, I could always
initlocation a big chunk of it and keep the WAL local and I should be ok.

Re: hanging for 30sec when checkpointing

From

Bruce Momjian

Date:

09 February 2004, 21:13:12

scott.marlowe wrote:
> On Mon, 9 Feb 2004, Goulet, Dick wrote:
>
> > Scott,
> >
> >     If you feel it is necessary to apologize for such a minor
> > infraction of polite etiquette please come on over to Oracle-L.  We have
> > harshness 10 times greater.  Probably because there are so many
> > practioners and so many different points of view.  We call them "Holy
> > Wars".  The current blazing one is on RAID, the good, the bad, and the
> > ugly.
>
> Hehe, I grew up on FIDO net, so I know all about the flammage... :-)
>
> I can still remember the amiga versus atari ST holy wars of old.
>
> >     BTW: From a Holy War on Oracle-L of similar topic.  There is a
> > difference on how bad that lying IDE drive is depending on who the
> > vendor is, what system it's plugged into, and what OS is being used.
> > Some do a better job than others of "covering up" the lies.  The other
> > chap may have one of those better systems, so from his point of view
> > it's "old fashioned misinformation".  Doesn't mean it's not true, just
> > covered up better.  Kind of like "Air Freshener".
>
> Well, it's interesting that during all the testing I and many others were
> doing last year, it appeared the escalade IDE RAID controllers were doing
> SOMETHING (no is quite sure if it was disabling write cache or not, but we
> guessed that was so) that made the IDE drives under them safe from the
> power off data loss issue that IDE drives seem to suffer from.

Maybe the RAID card has a battery-backed write cache.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hanging for 30sec when checkpointing

From

Tom Lane

Date:

09 February 2004, 21:16:27

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> That said we have a really HUGE (~200 drive) IDE storage array my web /
> app server sits on top of.  No clue if that thing will reliably work under
> a database, and I'm in no hurry to find out.

> But since the fsync on WAL is all that seems important, I could always
> initlocation a big chunk of it and keep the WAL local and I should be ok.

Unfortunately not --- at checkpoint time, the constraint goes the other
way.  We have to be sure all the data file updates are down to disk
before we write a checkpoint record to the WAL log.  So you can still
get screwed if the data-file drive lies about write completion.

            regards, tom lane

Re: hanging for 30sec when checkpointing

From

"Peter Galbavy"

Date:

10 February 2004, 05:22:50

scott.marlowe wrote:
> I don't know who you think you are, but I've physically tested the
> stuff I'm talking about.  Care to qualify what you mean?

I would genuinely be interested in seeing the results and the methodology.

> IDE drives (all the ones I've ever tested) LIE about their write
> caches and fsync.  don't believe me?  Simple, hook one up, initiate
> 100 parallel transactions, pull the power plug, watch your database
> fail to come back up due to the corruption caused by the LYING IDE
> drives.

See my comment/question below.

> Do the same with SCSI.  watch the database come right back to life.
>
> If you're gonna accuse me of lying, you damned well better have the
> balls AND evidence to back it up.

I am NOT accussing anyone of lying, least of all people I don't personally
know, and certainly not you. What I am referring to is over-generalisation.
You made a long and detailed generalisation, without detailing anything.

My primary question, without seeing the way you did it, is can you comment
on whether you wrote your own testbed or did you rely on potentially flawed
OS interfaces ? Did you use a signal analyser ?

Now, I have *not* done the tests - hence my real interest, but I have had at
least as many problems with SCSI sub-systems as with IDE over the years.
Probably more actually. Ever since using IBM EIDE drives (the 75GXP
included, I am a lucky one) I have had very little, knock on wood, to worry
about even during power failures.

Peter

Re: hanging for 30sec when checkpointing

From

Tom Lane

Date:

10 February 2004, 12:55:02

"scott.marlowe" <scott.marlowe@ihs.com> writes:
>> Unfortunately not --- at checkpoint time, the constraint goes the other
>> way.  We have to be sure all the data file updates are down to disk
>> before we write a checkpoint record to the WAL log.  So you can still
>> get screwed if the data-file drive lies about write completion.

> Hmmm.  OK.  Would the transaction size be an issue here?  I.e. would small
> transactions likely be safer against corruption than large transactions?

Transaction size would make no difference AFAICS.  Reducing the interval
between checkpoints might make things safer in such a case.

> I ask because most of the testing I did was with pgbench running 100+
> simos (on a -s 100 pgbench database) and as long as the WAL drive was
> fsyncing correctly, the database survived.

Did you try pulling the plug immediately after a CHECKPOINT command
completes?  You could test by manually issuing a CHECKPOINT while
pgbench runs, and yanking power as soon as the prompt comes back.

            regards, tom lane

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

10 February 2004, 13:00:42

On Tue, 10 Feb 2004, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> >> Unfortunately not --- at checkpoint time, the constraint goes the other
> >> way.  We have to be sure all the data file updates are down to disk
> >> before we write a checkpoint record to the WAL log.  So you can still
> >> get screwed if the data-file drive lies about write completion.
>
> > Hmmm.  OK.  Would the transaction size be an issue here?  I.e. would small
> > transactions likely be safer against corruption than large transactions?
>
> Transaction size would make no difference AFAICS.  Reducing the interval
> between checkpoints might make things safer in such a case.
>
> > I ask because most of the testing I did was with pgbench running 100+
> > simos (on a -s 100 pgbench database) and as long as the WAL drive was
> > fsyncing correctly, the database survived.
>
> Did you try pulling the plug immediately after a CHECKPOINT command
> completes?  You could test by manually issuing a CHECKPOINT while
> pgbench runs, and yanking power as soon as the prompt comes back.

I will try that.  Thanks for the tip.  I'll let you know how it works
out.

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

10 February 2004, 13:31:47

On Mon, 9 Feb 2004, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > That said we have a really HUGE (~200 drive) IDE storage array my web /
> > app server sits on top of.  No clue if that thing will reliably work under
> > a database, and I'm in no hurry to find out.
>
> > But since the fsync on WAL is all that seems important, I could always
> > initlocation a big chunk of it and keep the WAL local and I should be ok.
>
> Unfortunately not --- at checkpoint time, the constraint goes the other
> way.  We have to be sure all the data file updates are down to disk
> before we write a checkpoint record to the WAL log.  So you can still
> get screwed if the data-file drive lies about write completion.

Hmmm.  OK.  Would the transaction size be an issue here?  I.e. would small
transactions likely be safer against corruption than large transactions?

I ask because most of the testing I did was with pgbench running 100+
simos (on a -s 100 pgbench database) and as long as the WAL drive was
fsyncing correctly, the database survived.

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

10 February 2004, 13:42:02

On Tue, 10 Feb 2004, Peter Galbavy wrote:

> scott.marlowe wrote:
> > I don't know who you think you are, but I've physically tested the
> > stuff I'm talking about.  Care to qualify what you mean?
>
> I would genuinely be interested in seeing the results and the methodology.
>
> > IDE drives (all the ones I've ever tested) LIE about their write
> > caches and fsync.  don't believe me?  Simple, hook one up, initiate
> > 100 parallel transactions, pull the power plug, watch your database
> > fail to come back up due to the corruption caused by the LYING IDE
> > drives.
>
> See my comment/question below.
>
> > Do the same with SCSI.  watch the database come right back to life.
> >
> > If you're gonna accuse me of lying, you damned well better have the
> > balls AND evidence to back it up.
>
> I am NOT accussing anyone of lying, least of all people I don't personally
> know, and certainly not you. What I am referring to is over-generalisation.
> You made a long and detailed generalisation, without detailing anything.

Oh, spreading misinformation isn't lying?  You live in a different world
than I do.

From www.dictionary.com:

misinformation

\Mis*in`for*ma"tion\, n. Untrue or incorrect information. --Bacon.

Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

> My primary question, without seeing the way you did it, is can you comment
> on whether you wrote your own testbed or did you rely on potentially flawed
> OS interfaces ? Did you use a signal analyser ?

Last year, I and several others on the pgsql lists ran a series of tests
to determine which drive subsystems could survive power off tests.  We
ran the tests by initiating dozens or hundreds of simultaneous
transactions against a postgresql machine, then pulling the plug in the
middle.

Due to the nature of postgresql, if a drive reports an fsync before it has
actually written out its cache, the database will be corrupted and refuse
to startup when the machine is powered back up.  Signal analyzers are
nice, but if the database doesn't work, it doesn't really matter what the
sig an says.  If you'd like to set one up and test that way be my guess,
but, the "rubber hitting the road" is when you simulate the real thing,
losing power during transactions.

Here's what we found:

SCSI drives, (at least all the ones we tested, I tested Seagate 18
gig 10krpm barracudas, many others were tested) as a group, passed the
test with flying colors.  No one at that time found a single SCSI drive
that failed it.

IDE drives, with write cache enabled, failed 100% of the time.

IDE drives, with write cache disabled, passed 100% of the time.

SCSI RAID controllers with battery backed cache set to write back passed.

The IDE RAID controller from Escalade passed.  I don't recall if we ever
found out if it had battery backed cache, or if it disabled the cache on
the drives.

Performance wise, the IDEs were neck and neck with the SCSI drives when
they had their write caches enabled.  When the write cache was disabled,
their performance was about 1/4 to 1/3 as fast as the SCSI drives.

The SCSI RAID card (lsi megaraid is what I tested, someone else tested
the adaptec) with battery backed cache as well as the escalade were
great performers.

> Now, I have *not* done the tests - hence my real interest, but I have had at
> least as many problems with SCSI sub-systems as with IDE over the years.
> Probably more actually. Ever since using IBM EIDE drives (the 75GXP
> included, I am a lucky one) I have had very little, knock on wood, to worry
> about even during power failures.

I've built servers with both IDE and SCSI in the past 5 years, and my
experience has been that while IDE is fine for file / print servers, it's
a disaster waiting to happen under postgresql.

Keep in mind, we're not talking drive failure rates, or cabling /
termination issues here, we're talking about the fact that with IDE drives
(without the escalade controller) you have two choices, fast, or safe.

With SCSI, you get both.  With the RAID controllers mentioned you have
both.

While my post may have seemed like simple uninformed opinion to you at the
time you read it, it was, in fact, backed up by weeks of research by both
myself and a handful of other people on the postgresql mailing lists.
Your extremely flippant remark could just as easily have been a request
for more information on how I had reached those conclusions, but no.  It
had to be an accusation of lying.  And that IS what it was.  No amount of
hand waving by you can change the fact that you accused me of
dissemenating misinformation, which is dissemenating untruths, which is
lying.

Re: hanging for 30sec when checkpointing

From

"Peter Galbavy"

Date:

11 February 2004, 04:33:19

scott.marlowe wrote:
> Oh, spreading misinformation isn't lying?  You live in a different
> world than I do.

Again, I apologise if you took my comment so strongly. I can understand when
something that someone works on so hard is critisised. OTOH however, your
original post that I replied to, was very presumptive and over generalised.
If you say 100% of all IDE drives with write caches from 100% of
manufacturers are broken, then I simply do not believe you.

>> From www.dictionary.com:
>
> misinformation
>
> \Mis*in`for*ma"tion\, n. Untrue or incorrect information. --Bacon.
>
> Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA,
> Inc.

Given the recent political issues over a certain military action, my use of
the word "misinformation" used the wider accepted definition including
"missing" information. Your reply below corrected that to some extent, but I
still have questions.

> Last year, I and several others on the pgsql lists ran a series of
> tests to determine which drive subsystems could survive power off
> tests.  We ran the tests by initiating dozens or hundreds of
> simultaneous transactions against a postgresql machine, then pulling
> the plug in the middle.

Thanks for that and the subsequent detail. What is still missing for me is
the simple question; "What OSes were tested ?" and more specifically was the
OS driver code compared for the SCSI and IDE subsystems ? It is highly
possible and probable that underlying OS drivers for IDE and SCSI were
written by different people and different attention to following standards
documentation.

Is any of this writtent up anywhere with more details ?It would make very
interesting reading.

rgds,
--
Peter

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

11 February 2004, 12:05:37

On Wed, 11 Feb 2004, Peter Galbavy wrote:

> scott.marlowe wrote:
> > Oh, spreading misinformation isn't lying?  You live in a different
> > world than I do.
>
> Again, I apologise if you took my comment so strongly. I can understand when
> something that someone works on so hard is critisised. OTOH however, your
> original post that I replied to, was very presumptive and over generalised.
> If you say 100% of all IDE drives with write caches from 100% of
> manufacturers are broken, then I simply do not believe you.

Find me one that doesn't lie.  No one I know has found one yet.  And we've
looked around.  I'd guess though, that with the current state of IDE
drivers in the two common free unixes / clones, that if fsync was obeyed,
the throughput on writes would drop quite a bit, since those drivers are
pretty much one thing at a time oriented.

Your beliefs won't change the fact that no one has shown a single IDE
drive that doesn't lie.  The fact that no one has shown a drive that
doesn't lie doesn't prove they all do either.  But until I see one that
behaves properly, I'll err on the side of caution, and assume they all do.

And the above paragraph reads like part of the SQL 92 spec... :-)

> > Last year, I and several others on the pgsql lists ran a series of
> > tests to determine which drive subsystems could survive power off
> > tests.  We ran the tests by initiating dozens or hundreds of
> > simultaneous transactions against a postgresql machine, then pulling
> > the plug in the middle.
>
> Thanks for that and the subsequent detail. What is still missing for me is
> the simple question; "What OSes were tested ?" and more specifically was the
> OS driver code compared for the SCSI and IDE subsystems ? It is highly
> possible and probable that underlying OS drivers for IDE and SCSI were
> written by different people and different attention to following standards
> documentation.

I tested linux, someone else tested on BSD.  I do not know if any other
flavors of Unix were tested.  It's in the archives, so you can search them
if you want to see.

We looked into it fairly closely, and if you read the comments in the IDE
code for both the BSD kernel and Linux kernel, you will see comments to
the fact that IDE drives basically all lie about fsync.  And stuff about
how to get the manufacturers to make drives that don't.

Basically, the single threaded design of earlier IDE interfaces is still
pretty much what's implemented today, and the latest IDE interface specs
seem to allow some kind of command queueing, but no one has a driver to
take advantage of it for either BSD or linux.  I'm a bit rusty on details,
it's been about 6 months or so.  Searching for fsync and IDE on the BSD
and linux kernel mailing lists should bring up some interesting results.
Who knows, the 2.6 linux kernel or latest BSD kernels may finally be
addressing these issues.

> Is any of this writtent up anywhere with more details ?It would make very
> interesting reading.

Other than in the archives, I don't think so.

psql not starting up..

From

"Jeremy Smith"

Date:

11 February 2004, 12:44:15

Hi again,

I have started working on migrating my DBs from mysql to postgres today and
ran into a problem.  When I type in psql, I get this error:

Welcome to psql 7.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

psql: relocation error: psql: undefined symbol: PQgetssl

It is showing psql as being version 7.3.3 even though I upgraded to 7.4.0
last night.  But when I installed the RPM for the libraries (among other
things) I got errors like this:

file /usr/share/locale/zh_CN/LC_MESSAGES/libpq.mo from install of
postgresql-libs-7.4-0.3PGDG conflicts with file from package
postgresql-libs-7.3.3-1PGDG

should I just go through one by one and delete the files that it is telling
me I have a conflict with?

Thanks alot!
Jeremy

Re: psql not starting up..

From

"Jeremy Smith"

Date:

11 February 2004, 13:27:56

To follow up on this, I just did try deleting those files and running the
RPM process again, and received the same errors.

rpm -ih postgresql-libs-7.4-0.3PGDG.i386.rpm
########################################### [100%]
file /usr/lib/libpq.so.3 from install of postgresql-libs-7.4-0.3PGDG
conflicts with file from package postgresql-libs-7.3.3-1PGDG
.
.
.

Jeremy

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Jeremy Smith
Sent: Wednesday, February 11, 2004 11:44 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] psql not starting up..


Hi again,

I have started working on migrating my DBs from mysql to postgres today and
ran into a problem.  When I type in psql, I get this error:

Welcome to psql 7.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

psql: relocation error: psql: undefined symbol: PQgetssl

It is showing psql as being version 7.3.3 even though I upgraded to 7.4.0
last night.  But when I installed the RPM for the libraries (among other
things) I got errors like this:

file /usr/share/locale/zh_CN/LC_MESSAGES/libpq.mo from install of
postgresql-libs-7.4-0.3PGDG conflicts with file from package
postgresql-libs-7.3.3-1PGDG

should I just go through one by one and delete the files that it is telling
me I have a conflict with?

Thanks alot!
Jeremy



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: psql not starting up..

From

"scott.marlowe"

Date:

11 February 2004, 14:00:21

Or maybe rpm -Uvh postgresql-7.4.rpm

would work too.

Re: psql not starting up..

From

"scott.marlowe"

Date:

11 February 2004, 14:27:01

You need to rpm -e 'oldpostgresqlpackagenamehere' to get rid of 7.3 first.

On Wed, 11 Feb 2004, Jeremy Smith wrote:

> To follow up on this, I just did try deleting those files and running the
> RPM process again, and received the same errors.
>
> rpm -ih postgresql-libs-7.4-0.3PGDG.i386.rpm
> ########################################### [100%]
> file /usr/lib/libpq.so.3 from install of postgresql-libs-7.4-0.3PGDG
> conflicts with file from package postgresql-libs-7.3.3-1PGDG
> .
> .
> .
>
> Jeremy
>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Jeremy Smith
> Sent: Wednesday, February 11, 2004 11:44 AM
> To: pgsql-admin@postgresql.org
> Subject: [ADMIN] psql not starting up..
>
>
> Hi again,
>
> I have started working on migrating my DBs from mysql to postgres today and
> ran into a problem.  When I type in psql, I get this error:
>
> Welcome to psql 7.3.3, the PostgreSQL interactive terminal.
>
> Type:  \copyright for distribution terms
>        \h for help with SQL commands
>        \? for help on internal slash commands
>        \g or terminate with semicolon to execute query
>        \q to quit
>
> psql: relocation error: psql: undefined symbol: PQgetssl
>
> It is showing psql as being version 7.3.3 even though I upgraded to 7.4.0
> last night.  But when I installed the RPM for the libraries (among other
> things) I got errors like this:
>
> file /usr/share/locale/zh_CN/LC_MESSAGES/libpq.mo from install of
> postgresql-libs-7.4-0.3PGDG conflicts with file from package
> postgresql-libs-7.3.3-1PGDG
>
> should I just go through one by one and delete the files that it is telling
> me I have a conflict with?
>
> Thanks alot!
> Jeremy
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: hanging for 30sec when checkpointing

From

gjm@caledoncard.com (Greg Mennie)

Date:

12 February 2004, 18:33:26

me@shanewright.co.uk (Shane Wright) wrote in message news:<40202216.4010608@shanewright.co.uk>...
> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> Ok, fair enough, the only problem is the whole box becomes pretty much
> unresponsive during this time - from what I can gather it's because it
> writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc).  So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...

I am having a similar problem and this is what I've found so far:

During the checkpoint the volume of data that's written isn't very
high and it goes on for a fairly long time (up to 20 seconds) at a
rate that appears to be well below our disk array's potential.  The
volume of data written is usually 1-5 MB/sec on an array that we've
tested to sustain over 50 MB/sec  (sequential writes, of course).

It turns out that what's going on is that the command queue for the
RAID array (3Ware RAID card) is filling up during the checkpoint and
is staying at the max (254 commands) for most of the checkpoint.  The
odd lucky insert appears to work, but is extremely slow.  In our case,
the WAL files are on the same array as the data files, so everything
grinds to a halt.

The machine we're running it on is a dual processor box with 2GB RAM.
Since most database read operations are being satisfied from the
cache, reading processes don't seem to be affected during the pauses.

I suspect that increasing the checkpoint frequency could help, since
the burst of commands on the disk channel would be shorter.  (it's
currently 300 seconds)

I have found that the checkpoint after a vacuum is the worst.  This
was the original problem which led to the investigation.

Besides more frequent checkpoints, I am at a loss as to what to do
about this.  Any help would be appreciated.

Thanks,

Greg

Re: hanging for 30sec when checkpointing

From

Steve Crawford

Date:

12 February 2004, 19:03:04

> > I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux
> > and I'm getting some weird performance at times.
<snip>
> I am having a similar problem and this is what I've found so far:
>
> During the checkpoint the volume of data that's written isn't very
> high and it goes on for a fairly long time (up to 20 seconds) at a
> rate that appears to be well below our disk array's potential.  The
> volume of data written is usually 1-5 MB/sec on an array that we've
> tested to sustain over 50 MB/sec  (sequential writes, of course).
>
> It turns out that what's going on is that the command queue for the
> RAID array (3Ware RAID card) is filling up during the checkpoint
> and is staying at the max (254 commands) for most of the
> checkpoint.  The odd lucky insert appears to work, but is extremely
> slow.  In our case, the WAL files are on the same array as the data
> files, so everything grinds to a halt.

I spoke with some 3Ware reps at a trade show and they recommended
adding the following to /etc/sysctl.conf:
vm.max-readahead = 256
vm.min-readahead = 128

These settings take effect at boot. To change on a running system:
echo 256 > /proc/sys/vm/max-readahead
echo 128 > /proc/sys/vm/min-readahead

This advice was specific to the 3Ware card on Linux.

Cheers,
Steve

Re: hanging for 30sec when checkpointing

From

"scott.marlowe"

Date:

12 February 2004, 20:41:51

On Thu, 12 Feb 2004, Steve Crawford wrote:

> > > I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux
> > > and I'm getting some weird performance at times.
> <snip>
> > I am having a similar problem and this is what I've found so far:
> >
> > During the checkpoint the volume of data that's written isn't very
> > high and it goes on for a fairly long time (up to 20 seconds) at a
> > rate that appears to be well below our disk array's potential.  The
> > volume of data written is usually 1-5 MB/sec on an array that we've
> > tested to sustain over 50 MB/sec  (sequential writes, of course).
> >
> > It turns out that what's going on is that the command queue for the
> > RAID array (3Ware RAID card) is filling up during the checkpoint
> > and is staying at the max (254 commands) for most of the
> > checkpoint.  The odd lucky insert appears to work, but is extremely
> > slow.  In our case, the WAL files are on the same array as the data
> > files, so everything grinds to a halt.
>
> I spoke with some 3Ware reps at a trade show and they recommended
> adding the following to /etc/sysctl.conf:
> vm.max-readahead = 256
> vm.min-readahead = 128
>
> These settings take effect at boot. To change on a running system:
> echo 256 > /proc/sys/vm/max-readahead
> echo 128 > /proc/sys/vm/min-readahead

Note that if you edit /etc/sysctl.conf and want the changes to take
effect, you can do so with:

sysctl -p

Re: hanging for 30sec when checkpointing

From

Bruce Momjian

Date:

12 February 2004, 22:09:51

Greg Mennie wrote:
> me@shanewright.co.uk (Shane Wright) wrote in message news:<40202216.4010608@shanewright.co.uk>...
> > Hi,
> >
> > I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> > getting some weird performance at times.
> >
> > When the db is under medium-heavy load, it periodically spawns a
> > 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> > Ok, fair enough, the only problem is the whole box becomes pretty much
> > unresponsive during this time - from what I can gather it's because it
> > writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.
> >
> > Other processes can continue to run (e.g. vmstat) but other things do
> > not (other queries, mostly running 'ps fax', etc).  So everything gets
> > stacked up till the checkpoint finishes and all is well again, untill
> > the next time...
>
> I am having a similar problem and this is what I've found so far:
>
> During the checkpoint the volume of data that's written isn't very
> high and it goes on for a fairly long time (up to 20 seconds) at a
> rate that appears to be well below our disk array's potential.  The
> volume of data written is usually 1-5 MB/sec on an array that we've
> tested to sustain over 50 MB/sec  (sequential writes, of course).
>
> It turns out that what's going on is that the command queue for the
> RAID array (3Ware RAID card) is filling up during the checkpoint and
> is staying at the max (254 commands) for most of the checkpoint.  The
> odd lucky insert appears to work, but is extremely slow.  In our case,
> the WAL files are on the same array as the data files, so everything
> grinds to a halt.
>
> The machine we're running it on is a dual processor box with 2GB RAM.
> Since most database read operations are being satisfied from the
> cache, reading processes don't seem to be affected during the pauses.
>
> I suspect that increasing the checkpoint frequency could help, since
> the burst of commands on the disk channel would be shorter.  (it's
> currently 300 seconds)
>
> I have found that the checkpoint after a vacuum is the worst.  This
> was the original problem which led to the investigation.
>
> Besides more frequent checkpoints, I am at a loss as to what to do
> about this.  Any help would be appreciated.

My guess is that the background writer will improve checkpoint
performance greatly in 7.5.  Jan has run tests that show the background
writer really reduces the checkpoint write storm.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hanging for 30sec when checkpointing

From

Greg Mennie

Date:

16 February 2004, 09:29:08

On Thu, 12 Feb 2004, Steve Crawford wrote:

> I spoke with some 3Ware reps at a trade show and they recommended
> adding the following to /etc/sysctl.conf:
> vm.max-readahead = 256
> vm.min-readahead = 128
>
> These settings take effect at boot. To change on a running system:
> echo 256 > /proc/sys/vm/max-readahead
> echo 128 > /proc/sys/vm/min-readahead
>
> This advice was specific to the 3Ware card on Linux.
>
> Cheers,
> Steve
>

Thanks for the information, Steve.

It didn't cure the write storm, but I'm sure I gained some read performance.

Every little bit helps...

- Greg