Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while - Mailing list pgsql-bugs

From Greg Clough
Subject Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while
Date
Msg-id CADjwvTM_SvRtS0_8Tw7nypUN3bpPU-6gOKH4sdfz4+xbmPRo3Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
I like the idea of replaying up to a consistent point, and then pausing any
new transactions until recovery_min_apply_delay has passed... but maybe my
original problem is OS specific, as I tried PostgreSQL v9.4.5 on CentOS
64-bit the standby database opens just fine... even when
the recovery_min_apply_delay is changed from 1 hour to 24 hours.  I'm
downloading Oracle Linux v6.7 to replicate my previous environment, and
I'll report back with my findings.


On 30 December 2015 at 13:44, Michael Paquier <michael.paquier@gmail.com>
wrote:

> On Wed, Dec 30, 2015 at 10:05 PM, Greg Clough <greg@gclough.com> wrote:
> > I figured that if the database was in a consistent state when it was
> > shutdown with a 1 hour delay, then when it was restarted with a new 24
> hour
> > delay then it would also be consistent and thus available for read-only
> > work.  In my test case, no transactions have occurred but the standby
> still
> > wasn't accessible.
>
> Are you sure about that? What does pg_xlogdump tells you? It seems
> based on the information given upthread that there was at least one
> transaction commit between the redo point and the consistent point.
> When a standby is restarted, it would redo the replay from the last
> same redo point, so that's just logic that the standby keeps being
> unavailable for connections.
>

I didn't run pg_xlogdump... but I did check the timestamps of the most
recently applied transaction, and it does not change before/after the
increase in recovery_min_apply_delay.  This was done on a private test
environment, so there is very little chance I accidentally did a
transaction.  The xlog pointer seem to change, but the replay timestamp is
identical... so what would cause the xlog to increase, but not actually be
a transaction?

I had always put the difference (7015E00 vs. 7015ED8) down to the
shutdown/startup writing something like a "end of file" into the WAL during
the shutdown... but that's purely a guess, as I don't know the internals of
how it really works:

*BEFORE*
Wed Nov 11 15:35:01 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015E00   | 2015-11-11 15:32:53.950376+00 | t
(1 row)

*AFTER*

Wed Nov 11 15:38:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015ED8    | 0/7015ED8   | 2015-11-11 15:32:53.950376+00 | t
(1 row)



.
> > I believe the unavailability of the standby for extended periods if the
> > recovery_min_apply_delay is increased will create some confusion, just
> as it
> > confused me initially.  I can see two schools of thought when the
> parameter
> > is increased:
> >
> > 1. The standby includes transactions that are newer than the
> > "recovery_min_apply_delay" setting, so it must be prevented from exposing
> > any data to ensure applications don't see data too soon.
> > 2. The standby is consistent, so it should be available for read-only
> > queries... but any new WAL should not be applied until the commit time
> is >
> > recovery_min_apply_delay.
>
> If the standby has a transaction commit only after a consistent point
> is reached, you will be good to connect anyway.
>
> > Obviously I fall into camp #2, where I expected the database to basically
> > pause application of WAL until it passed the recovery_min_apply_delay...
> but
> > still be available for read-only queries.
> >
> > If the preferred option is #1, then could we introduce a new error
> message
> > so that it's a bit more communicative.  Maybe something like:
> >
> > psql: FATAL:  the database system has transactions newer than
> > recover_min_apply_delay. Waiting...
>
> Hm. The current error message depends on the state of the database
> reported by the postmaster. It does not seem to me a good idea to
> expose a new state at this level regarding recover_min_apply_delay.
>
> In short, it seems to me that the correct way to address your concerns
> is actually a documentation addition, the point being to mention that
> a hot standby would take a longer time to become available for
> read-only connections, aka allowing read-only connections if there are
> transaction commits that happened between the redo point and the
> consistent point.
> --
> Michael
>

OK, that makes sense.  It appears that these pages would be affected, but
as per my testing above... it doesn't seem to be an issue on CentOS like it
was on Oracle Linux:

http://www.postgresql.org/docs/9.4/static/standby-settings.html
http://www.postgresql.org/docs/9.5/static/standby-settings.html

I'll diagnose it on Oracle Linux vs. CentOS first, and then report back.

Cheers.
Greg.

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #13783: 'create database test owner testowner' as 'postgres' leaves test.public owned by postgres
Next
From: Ali Raza
Date:
Subject: Re: BUG #13839: command line restart server fail