Re: Hot Standby startup with overflowed snapshots - Mailing list pgsql-hackers

From Chris Redekop
Subject Re: Hot Standby startup with overflowed snapshots
Date
Msg-id CAC2SuRLM07gseDBeyqTL2AfkQmOHvkcNvpBV_qRxoPkO76-FwA@mail.gmail.com
Whole thread Raw
In response to Re: Hot Standby startup with overflowed snapshots  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Hot Standby startup with overflowed snapshots
List pgsql-hackers
Sorry..."designed" was poor choice of words, I meant "not unexpected".  Doing the checkpoint right after pg_stop_backup() looks like it will work perfectly for me, so thanks for all your help!  

On a side note I am sporadically seeing another error on hotstandby startup.  I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal.  The error is "FATAL:  out-of-order XID insertion in KnownAssignedXids".  If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :)


On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop <chris@replicon.com> wrote:

> hrmz, still basically the same behaviour.  I think it might be a *little*
> better with this patch.  Before when under load it would start up quickly
> maybe 2 or 3 times out of 10 attempts....with this patch it might be up to 4
> or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
> only seeing your log statement a single time (I'm running at debug2).  I
> have discovered something though - when the standby is in this state if I
> force a checkpoint on the primary then the standby comes right up.  Is there
> anything I check or try for you to help figure this out?....or is it
> actually as designed that it could take 10-ish minutes to start up even
> after all clients have disconnected from the primary?

Thanks for testing. The improvements cover specific cases, so its not
subject to chance; its not a performance patch.

It's not "designed" to act the way you describe, but it does.

The reason this occurs is that you have a transaction heavy workload
with occasional periods of complete quiet and a base backup time that
is much less than checkpoint_timeout. If your base backup was slower
the checkpoint would have hit naturally before recovery had reached a
consistent state. Which seems fairly atypical. I guess you're doing
this on a test system.

It seems cheap to add in a call to LogStandbySnapshot() after each
call to pg_stop_backup().

Does anyone think this case is worth adding code for? Seems like one
more thing to break.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pg_upgrade if 'postgres' database is dropped
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade if 'postgres' database is dropped