Re: Running out of memory at vacuum - Mailing list pgsql-general

From Ioana Danes
Subject Re: Running out of memory at vacuum
Date
Msg-id 1368711339.50016.YahooMailNeo@web164603.mail.gq1.yahoo.com
Whole thread Raw
In response to Re: Running out of memory at vacuum  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Running out of memory at vacuum  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Hi Jeff,

On Tuesday, May 14, 2013, Ioana Danes wrote:
Hi all,

I have a production database that sometimes runs out of memory at nightly vacuum.

The application runs typically with around 40 postgres connections but there are times when the connections increase because of some queries going on.

You should consider not allowing that to happen.  If 40 connections is the sweet spot for your hardware, then you can't change reality simply by changing the connection pool setting.
 
Yes I agree

The reason is that the operations are slow, the terminals time out and try to reconnect using new connections.


So, "The beatings will continue until morale improves".  You should consider not allowing that to happen, either.  If the original operation is slow, why would trying it again on a new connection (while leaving the original one behind to clog things up) be any better?

Yes I agree it is just hard to distinguish between a real and false request for a new connections (high sales or bad queries).
 
Some time ago I started to have problems with too many connections being open so I lowered the limit to 300 connections. It was all good until recently when even with 180 connections I am running out of memory on vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that
 
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are idle.

Truly idle, or idle in transaction, or not even that?  If the "abandoned" connections have completed whatever long running thing caused them to be abandoned, then they should no longer exist.  What are the largest processes according to top?
They were IDLE (not IDLE IN TRANSACTION)

You seem to have a series of unfortunate events here, but I think you are tackling them from the wrong end.  What are these slow queries that take up a lot of memory, and why can't they be improved?  Given that you have slow queries that take a lot of memory, why does your app respond to this by launching a Denial of Service attack against its own database, and why do you let it do that?

I never said that I am ignoring the bad queries or the bad processes. Sometimes it just takes time to catch and fix. It is just the reality....
Not all the time the pb is very obvious and easy to reproduce.


Thank you very much for your response,
ioana

pgsql-general by date:

Previous
From: Ryan Kelly
Date:
Subject: Re: problem with lost connection while running long PL/R query
Next
From: Tom Lane
Date:
Subject: Re: Creating Extension pg_trgm