Re: recovery_connections cannot start (was Re: master in standby mode croaks) - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: recovery_connections cannot start (was Re: master in standby mode croaks)
Date
Msg-id 4BD581C5.70301@kaltenbrunner.cc
Whole thread Raw
In response to Re: recovery_connections cannot start (was Re: master in standby mode croaks)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas wrote:
> On Fri, Apr 23, 2010 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Well, I think the real hole is that turning archive_mode=on results in
>>> WAL never being deleted unless it's successfully archived.
>> Hm, good point.  And at least in principle you could have SR setups
>> that don't care about having a backing WAL archive.
>>
>>> But we might be able to handle that like this:
>>> wal_mode={standby|archive|crash}  # or whatever
>>> wal_segments_always=<integer>   # keep this many segments always, for
>>> SR - like current wal_keep_segments
>>> wal_segments_unarchived=<integer> # keep this many unarchived
>>> segments, -1 for infinite
>>> max_wal_senders=<integer>          # same as now
>>> archive_command=<string>            # same as now
>>> So we always retain wal_segments_always segments, but if we have
>>> trouble with archiving we'll retain up to wal_segments_archived.
>> And when that limit is reached, what happens?  Panic shutdown?
>> Silently drop unarchived data?  Neither one sounds very good.
> 
> Silently drop unarchived data.  I agree that isn't very good, but
> think about it this way: if archive_command is failing, then our log
> shipping slave is not going to work.  But letting the disk fill up on
> the primary does not make it any better.  It just makes the primary
> stop working, too.  Obviously, all of this stuff needs to be monitored
> or you're playing with fire, but I don't think having a safety valve
> on the primary is a stupid idea.

hmm not sure I agree - you need to monitor diskspace usage in general on 
a system for obvious reasons. I think dealing with that kind of stuff is 
not really in our realm. We are a relational database and we need to 
guard the data, silently dropping data is imho not a good idea.
Just picture the typical scenario of maintenance during night times on 
the standby done by a sysadmin with some batch jobs running on the 
master just generating enough WAL to exceed the limit that will just 
cause the sysadmin to call the DBA in.
In general the question really is "will people set this to something 
sensible or rather to an absurdly high value just to avoid that their 
replication will ever break" - I guess people will do that later in 
critical environments...


Stefan


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: recovery_connections cannot start (was Re: master in standby mode croaks)
Next
From: Marko Tiikkaja
Date:
Subject: INSERT and parentheses