Re: 8.3rc1 Out of memory when performing update - Mailing list pgsql-performance

From Stephen Denne
Subject Re: 8.3rc1 Out of memory when performing update
Date
Msg-id F0238EBA67824444BC1CB4700960CB480482B748@dmpeints002.isotach.com
Whole thread Raw
In response to Re: 8.3rc1 Out of memory when performing update  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
Responses Re: 8.3rc1 Out of memory when performing update  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
>>"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
>>> I altered the update statement slightly, and reran the query.
>>> The altered query has been running over 3 hours now,
>>> without using lots of memory (38M private bytes).
>>> 2046 temp files were created (2.54GB worth),
>>> which have recently changed from slowly growing in size
>>> to very very slowly reducing in number.

To which Tom Lane replied:
>>Hmm.  I think what that really means is you haven't got to the part of
>>the query where the leak is :-(.

I then said:
> It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

Well, it is now Monday morning for me, and those temp files are still slowly reducing in number.
There are now only 1629 of them left, so I'm guessing that the query is about 20% done.
The PC seems to have been steadily but very slowly working away at this very simple query for close to 70 hours.
I decided not to leave this query running for a fortnight to find out if I then strike the memory leak.
Private Bytes had grown to 685MB
I cancelled the query.

Rough snapshot of what was happening with IO (a single 7200 IDE disk):

The process for the update query was reading about 500KB/second , writing between 80KB/second to 200KB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was writing about 400KB/second
Checkpoints were 10 minutes apart, taking about 85 seconds to write 1000 buffers.

What could cause such poor performance?
I presume that the disk was being forced to move the head a great deal.

I also asked:
> If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

I dropped two indexes and one fk constraint and ran VACUUM FULL VERBOSE ANALYZE document_file;
As an indication of the disk performance: at its peak the vacuum process was reading and writing 20MB/seconds
(sustained),completing in less than 11 minutes. 

I reran the original query.
It used constant memory (6.7MB private bytes)
It was reading 2 to 3MB/second, writing 3 to 6MB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was initially writing about 1MB/second, increasing to about 3MB/second

Checkpoints in the middle of this query were taking up to 13 seconds to write 100 buffers.

The checkpoint after the query took 300 seconds (exactly half the checkpoint interval), and was writing about
60KB/second.It wrote 2148 buffers. 

So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this
confirmthat the memory leak you found is the one I was suffering from? 

I'd also class the extremely poor performance of the alternate query as a bug.
Why take a fortnight when you could take three quarters of an hour? (Granted there where two less indexes to update,
butthat is still too long.) 

Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from
runningout of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to
workas expected (the 3 day old connection that is). 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


pgsql-performance by date:

Previous
From: Marcin Stępnicki
Date:
Subject: Re: Slow set-returning functions
Next
From: "Merlin Moncure"
Date:
Subject: Re: Slow set-returning functions