Re: Hot standby, slot ids and stuff - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot standby, slot ids and stuff
Date
Msg-id 1231506512.18005.432.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Hot standby, slot ids and stuff  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> >> I mean the standby should stop trying to track the in progress 
> >> transactions in recovery procs, and apply the WAL records like it does 
> >> before the consistent state is reached.
> > 
> > ...
> > 
> > So, if we don't PANIC, how should we behave?
> > 
> > Without full information on running-xacts we would be unable to take a
> > snapshot, so should:
> > * backends be forcibly disconnected?
> > * backends hang waiting for snapshot info to be re-available again in X
> > minutes worth of WAL time?
> > * backends throw an ERROR:  unable to provide snapshot at this time,
> > DETAIL: retry your statement later. 
> > ...other alternatives
> > 
> > and possibly prevent new connections.
> 
> All of those seem reasonable to me. The 2nd option seems nicest, "X 
> minutes" should probably be controlled by max_standby_delay, after which 
> you can throw an error.

Hmm, we use the recovery procs to track transactions that have
TransactionIds assigned. That means we will overflow only if we have
approach 100% write transactions at any time, or if we have more write
transactions in progress than we have max_connections on standby.

So it sounds like the overflow situation would probably be both rare
and, if it did occur, may not occur for long periods.

> If we care enough, we could also keep tracking the transactions in 
> backend-private memory of the startup process, until there's enough room 
> in proc array. That would make the outage shorter, because you wouldn't 
> have to wait until the next running-xacts record, but only until enough 
> transactions have finished that they all fit in proc array again.
> 
> But whatever is the simplest, really.

The above does sound best since it would allow us to have the snapshot
hang for a short period. But at this stage of the game, more complex.

For now though, since it looks like it would happen fairly rarely, I'd
opt for the simplest: throw an ERROR.

> > If max_connections is higher on primary then the standby will *never* be
> > available for querying. Should we have multiple ERRORs depending upon
> > whether the situation is hopefully-temporary or looks-permanent?
> > 
> > Don't assume I want the PANIC. That clearly needs to be revisited if we
> > change slotids. 
> 
> It needs to be revisited whether we change slotids or not, IMHO.
> 
> Note that with slotids, you have a problem as soon as any of the slots 
> that don't exist on standby are used, regardless of how many concurrent 
> transactions there actually is. Without slots you only have a problem if 
>   you really have more than standby's max_connections concurrent 
> transactions. That makes a big difference in practice.

Sometimes, but mostly people set max_connections higher because they
intend to use those extra connections. So no real advantage there
against the slotid approach :-)

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Solve a problem of LC_TIME of windows.
Next
From: Tom Lane
Date:
Subject: Re: Buffer pool statistics in Explain Analyze