Re: is it cool to restart servers as preventive maintenance? - Mailing list pgsql-admin

From Kiriakos Georgiou
Subject Re: is it cool to restart servers as preventive maintenance?
Date
Msg-id 27A8518A-47E5-4BD0-9EF0-6AFEBE4A4DB3@olympiakos.com
Whole thread Raw
In response to Re: is it cool to restart servers as preventive maintenance?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: is it cool to restart servers as preventive maintenance?
List pgsql-admin
> On Feb 10, 2016, at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kiriakos Georgiou <kg.postgresql@olympiakos.com> writes:
>> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
>> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project
wedo some rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this
(that’smy theory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis. 
>
> What sort of "database flakiness"?
>
> It's possible you're encountering some kind of bug (memory leak?) in
> PostGIS, but that would be a bug you ought to get them to fix, not a
> reason why periodic restarts are a good idea.
>
>             regards, tom lane
>



Flaky = the database appears to be running OK (I can run queries via psql) but our app is down for no apparent reason.
Restartingthe app servers multiple times did not help.  Although the database seemed to respond fine to queries via
psql,I decided to restart it.  That was a good move, our app worked fine after the database restart. 

There is more to it: about 8 hours earlier our warm standby postgresql filled up the volume it puts the server logs by
repeatingthe following two lines in the server log, millions of times: 

WARNING:   out of shared memory
CONTEXT:   xlog redo AccessExclusive locks: xid 2002212 db 16384 rel 1079879

I had seen this on our primary about a year ago and I kept doubling max_locks_per_transaction all the way to 1024, at
whichpoint the problem did not reoccur (on the primary.) 
I still occasionally (once every 3-4 months) see the “out of shared memory” message on the standby although
max_locks_per_transactionhas the same 1024 value as the primary.  When this happens I have to rebuilt it from the
primaryvia pg_basebackup.  When the “out of shared memory” happens on the standby, it’s a coin toss whether the primary
willbehave flaky or not.  This time it did, and the restart fixed it. 

Writing this email made me realize the likely cause of our problem.  It’s the “out of shared memory” issue.  We do have
aplpgsql function that calls 30+ other plpgsql functions, some of which create temp tables.  Some of the calls are
withinloops.  So depending on data inputs we can get hundreds of locks to temp tables within a single transaction.  The
oddthing is that at max_locks_per_transaction = 1024 we no longer get any “out of shared memory” on the primary, but we
doon the stand by.  Any ideas about that?  Should I increase max_locks_per_transaction yet again? 

thanks,
Kiriakos Georgiou

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: is it cool to restart servers as preventive maintenance?
Next
From: Scott Whitney
Date:
Subject: Re: is it cool to restart servers as preventive maintenance?