Re: Hot standby, slot ids and stuff - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hot standby, slot ids and stuff |
Date | |
Msg-id | 1231506512.18005.432.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Hot standby, slot ids and stuff (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
List | pgsql-hackers |
On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote: > >> I mean the standby should stop trying to track the in progress > >> transactions in recovery procs, and apply the WAL records like it does > >> before the consistent state is reached. > > > > ... > > > > So, if we don't PANIC, how should we behave? > > > > Without full information on running-xacts we would be unable to take a > > snapshot, so should: > > * backends be forcibly disconnected? > > * backends hang waiting for snapshot info to be re-available again in X > > minutes worth of WAL time? > > * backends throw an ERROR: unable to provide snapshot at this time, > > DETAIL: retry your statement later. > > ...other alternatives > > > > and possibly prevent new connections. > > All of those seem reasonable to me. The 2nd option seems nicest, "X > minutes" should probably be controlled by max_standby_delay, after which > you can throw an error. Hmm, we use the recovery procs to track transactions that have TransactionIds assigned. That means we will overflow only if we have approach 100% write transactions at any time, or if we have more write transactions in progress than we have max_connections on standby. So it sounds like the overflow situation would probably be both rare and, if it did occur, may not occur for long periods. > If we care enough, we could also keep tracking the transactions in > backend-private memory of the startup process, until there's enough room > in proc array. That would make the outage shorter, because you wouldn't > have to wait until the next running-xacts record, but only until enough > transactions have finished that they all fit in proc array again. > > But whatever is the simplest, really. The above does sound best since it would allow us to have the snapshot hang for a short period. But at this stage of the game, more complex. For now though, since it looks like it would happen fairly rarely, I'd opt for the simplest: throw an ERROR. > > If max_connections is higher on primary then the standby will *never* be > > available for querying. Should we have multiple ERRORs depending upon > > whether the situation is hopefully-temporary or looks-permanent? > > > > Don't assume I want the PANIC. That clearly needs to be revisited if we > > change slotids. > > It needs to be revisited whether we change slotids or not, IMHO. > > Note that with slotids, you have a problem as soon as any of the slots > that don't exist on standby are used, regardless of how many concurrent > transactions there actually is. Without slots you only have a problem if > you really have more than standby's max_connections concurrent > transactions. That makes a big difference in practice. Sometimes, but mostly people set max_connections higher because they intend to use those extra connections. So no real advantage there against the slotid approach :-) -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: