Thread: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

From
postgresql.org@gclough.com
Date:
The following bug has been logged on the website:

Bug reference:      13770
Logged by:          Greg Clough
Email address:      postgresql.org@gclough.com
PostgreSQL version: 9.4.5
Operating system:   Oracle Linux v6.7
Description:

If you extend the recovery_min_apply_delay on a standby and then restart it,
you cannot connect to it for read-only transactions until the most recent
transaction time >= delay time.  This is problematic if you extend from say
1 hour to 24 hours, as you have to wait almost a day before you can connect
to your Standby again.

I can't find anything in the documentation to describe this as intended
behaviour, and my expectation was that when I extended the delay that the
Standby database would come up and be available for read-only transactions,
but would simply delay any new transactions until they were older than
recovery_min_apply_delay.

I'm curious if this was a design decision to stop people getting confused
when they access a database before recovery_min_apply_delay has expired and
get data that's too recent, or if it was an unintentional consequence of the
way it was implemented.


Environment:

Oracle Linux v6.7
PostgreSQL v9.4.5 (Installed via yum)

postgres@gclough[5432] ~$ uname -a
Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
2015 x86_64 x86_64 x86_64 GNU/Linux

postgres@gclough[5432] ~$ psql --version
psql (PostgreSQL) 9.4.5


Test Case:

1. Create a Primary database on port 5432

export PGDATA=/u01/postgres/9.4/5432/data
export PGPORT=5432
initdb

wal_level = hot_standby                 # minimal, archive, hot_standby, or
logical
max_wal_senders = 1             # max number of walsender processes

2. Replicate it to a Standby database on port 5433

export PGDATA=/u01/postgres/9.4/5433/data
export PGPORT=5433

port = 5433                             # (change requires restart)
wal_level = hot_standby                 # minimal, archive, hot_standby, or
logical
max_wal_senders = 1             # max number of walsender processes
hot_standby = on                        # "on" allows queries during
recovery

3. On the Standby (Port 5433), introduce a 2min time delay with
"recovery_min_apply_delay" in the recovery.conf:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=localhost user=postgres
application_name=port_5433'
recovery_min_apply_delay = 2min

4. Check the current lag on the Standby (Port 5433):

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:32:32 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7000000    | 0/7000000   |                               | t
(1 row)

5. Create a small test table on the Primary (Port 5432):

CREATE TABLE delay_test (id NUMERIC);
INSERT INTO delay_test VALUES (1);

6. Check the application of WAL on the Standby (Port 5433), and note the
delay:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:33:06 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015DC8    | 0/7015890   |                               | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:34:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015890   |                               | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:01 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015E00   | 2015-11-11 15:32:53.950376+00 | t
(1 row)


7. Reset the time delay on the Standby from 2 to 5 minutes

recovery_min_apply_delay = 5min

pg_ctl restart -m fast

8. Try to connect to the Standby, and get a failure:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:26 GMT 2015
psql: FATAL:  the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:36:01 GMT 2015
psql: FATAL:  the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:37:00 GMT 2015
psql: FATAL:  the database system is starting up

9. Wait 3 minutes... then it works:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:38:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015ED8    | 0/7015ED8   | 2015-11-11 15:32:53.950376+00 | t
(1 row)
Hi,

I didn't see any response to this, and I do believe it's an actual
PostgreSQL bug... or at least a case of undesirable undocumented behaviour.

*SHORT SUMMARY:  If you are using a Standby with recovery_min_apply_delay =
1h and you increase it to 24h, then your Standby will be unavailable for
the next 23h.*

Could someone confirm my diagnosis?

Thanks.
Greg Clough.


On 13 November 2015 at 09:02, Greg Clough <greg@gclough.com> wrote:

> I may have glossed over how I created the standby, but just for clarity it
> was done with the primary shutdown cleanly:
>
> su - postgres
> export PGPORT=5432
> pg_ctl stop -m fast
> cd /u01/postgres/9.4/5432
> tar cvf - ./data | (cd /u01/postgres/9.4/5433 && tar xvf -)
>
> Regads.
> Greg.
>
> On 11 November 2015 at 15:50, <postgresql.org@gclough.com> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      13770
>> Logged by:          Greg Clough
>> Email address:      postgresql.org@gclough.com
>> PostgreSQL version: 9.4.5
>> Operating system:   Oracle Linux v6.7
>> Description:
>>
>> If you extend the recovery_min_apply_delay on a standby and then restart
>> it,
>> you cannot connect to it for read-only transactions until the most recent
>> transaction time >= delay time.  This is problematic if you extend from
>> say
>> 1 hour to 24 hours, as you have to wait almost a day before you can
>> connect
>> to your Standby again.
>>
>> I can't find anything in the documentation to describe this as intended
>> behaviour, and my expectation was that when I extended the delay that the
>> Standby database would come up and be available for read-only
>> transactions,
>> but would simply delay any new transactions until they were older than
>> recovery_min_apply_delay.
>>
>> I'm curious if this was a design decision to stop people getting confused
>> when they access a database before recovery_min_apply_delay has expired
>> and
>> get data that's too recent, or if it was an unintentional consequence of
>> the
>> way it was implemented.
>>
>>
>> Environment:
>>
>> Oracle Linux v6.7
>> PostgreSQL v9.4.5 (Installed via yum)
>>
>> postgres@gclough[5432] ~$ uname -a
>> Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>
>> postgres@gclough[5432] ~$ psql --version
>> psql (PostgreSQL) 9.4.5
>>
>>
>> Test Case:
>>
>> 1. Create a Primary database on port 5432
>>
>> export PGDATA=/u01/postgres/9.4/5432/data
>> export PGPORT=5432
>> initdb
>>
>> wal_level = hot_standby                 # minimal, archive, hot_standby,
>> or
>> logical
>> max_wal_senders = 1             # max number of walsender processes
>>
>> 2. Replicate it to a Standby database on port 5433
>>
>> export PGDATA=/u01/postgres/9.4/5433/data
>> export PGPORT=5433
>>
>> port = 5433                             # (change requires restart)
>> wal_level = hot_standby                 # minimal, archive, hot_standby,
>> or
>> logical
>> max_wal_senders = 1             # max number of walsender processes
>> hot_standby = on                        # "on" allows queries during
>> recovery
>>
>> 3. On the Standby (Port 5433), introduce a 2min time delay with
>> "recovery_min_apply_delay" in the recovery.conf:
>>
>> standby_mode = 'on'
>> primary_conninfo = 'port=5432 host=localhost user=postgres
>> application_name=port_5433'
>> recovery_min_apply_delay = 2min
>>
>> 4. Check the current lag on the Standby (Port 5433):
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:32:32 GMT 2015
>>  receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
>> --------------+-------------+-------------------------------+------------
>>  0/7000000    | 0/7000000   |                               | t
>> (1 row)
>>
>> 5. Create a small test table on the Primary (Port 5432):
>>
>> CREATE TABLE delay_test (id NUMERIC);
>> INSERT INTO delay_test VALUES (1);
>>
>> 6. Check the application of WAL on the Standby (Port 5433), and note the
>> delay:
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:33:06 GMT 2015
>>  receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
>> --------------+-------------+-------------------------------+------------
>>  0/7015DC8    | 0/7015890   |                               | t
>> (1 row)
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:34:00 GMT 2015
>>  receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
>> --------------+-------------+-------------------------------+------------
>>  0/7015E00    | 0/7015890   |                               | t
>> (1 row)
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:35:01 GMT 2015
>>  receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
>> --------------+-------------+-------------------------------+------------
>>  0/7015E00    | 0/7015E00   | 2015-11-11 15:32:53.950376+00 | t
>> (1 row)
>>
>>
>> 7. Reset the time delay on the Standby from 2 to 5 minutes
>>
>> recovery_min_apply_delay = 5min
>>
>> pg_ctl restart -m fast
>>
>> 8. Try to connect to the Standby, and get a failure:
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:35:26 GMT 2015
>> psql: FATAL:  the database system is starting up
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:36:01 GMT 2015
>> psql: FATAL:  the database system is starting up
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:37:00 GMT 2015
>> psql: FATAL:  the database system is starting up
>>
>> 9. Wait 3 minutes... then it works:
>>
>> postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
>> pg_last_xlog_receive_location() "receive_xlog",
>> pg_last_xlog_replay_location() "replay_xlog",
>> pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
>> Wed Nov 11 15:38:00 GMT 2015
>>  receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
>> --------------+-------------+-------------------------------+------------
>>  0/7015ED8    | 0/7015ED8   | 2015-11-11 15:32:53.950376+00 | t
>> (1 row)
>>
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>>
>
>
On Mon, Dec 21, 2015 at 7:44 PM, Greg Clough <greg@gclough.com> wrote:
> I didn't see any response to this, and I do believe it's an actual
> PostgreSQL bug... or at least a case of undesirable undocumented behaviour.
>
> SHORT SUMMARY:  If you are using a Standby with recovery_min_apply_delay =
> 1h and you increase it to 24h, then your Standby will be unavailable for the
> next 23h.
>
> Could someone confirm my diagnosis?

When recovery_min_apply_delay is set, replay should wait for the delay
defined in the case of a COMMIT or COMMIT PREPARED record, and this
even if the minimum recovery point ensuring that standby is in a
consistent state on is not reached. In short, if there is a COMMIT
before the standby thinks that it has reached a consistent state to
allow read-only queries, which is what you are looking for, your
application won't be able to connect to the standby, and the feature
behaves correctly.

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...
--
Michael
On Wed, Dec 30, 2015 at 10:05 PM, Greg Clough <greg@gclough.com> wrote:
> I figured that if the database was in a consistent state when it was
> shutdown with a 1 hour delay, then when it was restarted with a new 24 hour
> delay then it would also be consistent and thus available for read-only
> work.  In my test case, no transactions have occurred but the standby still
> wasn't accessible.

Are you sure about that? What does pg_xlogdump tells you? It seems
based on the information given upthread that there was at least one
transaction commit between the redo point and the consistent point.
When a standby is restarted, it would redo the replay from the last
same redo point, so that's just logic that the standby keeps being
unavailable for connections.
.
> I believe the unavailability of the standby for extended periods if the
> recovery_min_apply_delay is increased will create some confusion, just as it
> confused me initially.  I can see two schools of thought when the parameter
> is increased:
>
> 1. The standby includes transactions that are newer than the
> "recovery_min_apply_delay" setting, so it must be prevented from exposing
> any data to ensure applications don't see data too soon.
> 2. The standby is consistent, so it should be available for read-only
> queries... but any new WAL should not be applied until the commit time is >
> recovery_min_apply_delay.

If the standby has a transaction commit only after a consistent point
is reached, you will be good to connect anyway.

> Obviously I fall into camp #2, where I expected the database to basically
> pause application of WAL until it passed the recovery_min_apply_delay... but
> still be available for read-only queries.
>
> If the preferred option is #1, then could we introduce a new error message
> so that it's a bit more communicative.  Maybe something like:
>
> psql: FATAL:  the database system has transactions newer than
> recover_min_apply_delay. Waiting...

Hm. The current error message depends on the state of the database
reported by the postmaster. It does not seem to me a good idea to
expose a new state at this level regarding recover_min_apply_delay.

In short, it seems to me that the correct way to address your concerns
is actually a documentation addition, the point being to mention that
a hot standby would take a longer time to become available for
read-only connections, aka allowing read-only connections if there are
transaction commits that happened between the redo point and the
consistent point.
--
Michael
Hi Michael,

I figured that if the database was in a consistent state when it was
shutdown with a 1 hour delay, then when it was restarted with a new 24 hour
delay then it would also be consistent and thus available for read-only
work.  In my test case, no transactions have occurred but the standby still
wasn't accessible.  I think this was compounded, as I think the error
itself a bit confusing:

psql: FATAL:  the database system is starting up


I believe the unavailability of the standby for extended periods if the
recovery_min_apply_delay is increased will create some confusion, just as
it confused me initially.  I can see two schools of thought when the
parameter is increased:

1. The standby includes transactions that are newer than the
"recovery_min_apply_delay" setting, so it must be prevented from exposing
any data to ensure applications don't see data too soon.

2. The standby is consistent, so it should be available for read-only
queries... but any new WAL should not be applied until the commit time is >
recovery_min_apply_delay.


Obviously I fall into camp #2, where I expected the database to basically
pause application of WAL until it passed the recovery_min_apply_delay...
but still be available for read-only queries.

If the preferred option is #1, then could we introduce a new error message
so that it's a bit more communicative.  Maybe something like:

psql: FATAL:  the database system has transactions newer than
recover_min_apply_delay. Waiting...


... or something better of your choosing.

Regards.
Greg.

On 26 December 2015 at 13:45, Michael Paquier <michael.paquier@gmail.com>
wrote:

> On Mon, Dec 21, 2015 at 7:44 PM, Greg Clough <greg@gclough.com> wrote:
> > I didn't see any response to this, and I do believe it's an actual
> > PostgreSQL bug... or at least a case of undesirable undocumented
> behaviour.
> >
> > SHORT SUMMARY:  If you are using a Standby with recovery_min_apply_delay
> =
> > 1h and you increase it to 24h, then your Standby will be unavailable for
> the
> > next 23h.
> >
> > Could someone confirm my diagnosis?
>
> When recovery_min_apply_delay is set, replay should wait for the delay
> defined in the case of a COMMIT or COMMIT PREPARED record, and this
> even if the minimum recovery point ensuring that standby is in a
> consistent state on is not reached. In short, if there is a COMMIT
> before the standby thinks that it has reached a consistent state to
> allow read-only queries, which is what you are looking for, your
> application won't be able to connect to the standby, and the feature
> behaves correctly.
>
> Depending on the use cases, it may be interesting to have a switch
> allowing to not apply the delay should a consistent point not be
> reached though...
> --
> Michael
>
On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:
> Depending on the use cases, it may be interesting to have a switch
> allowing to not apply the delay should a consistent point not be
> reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role.  I'm tempted to simply only check for delay when
consistent.
Andres Freund <andres@anarazel.de> writes:
> On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:
>> Depending on the use cases, it may be interesting to have a switch
>> allowing to not apply the delay should a consistent point not be
>> reached though...

> Is there actually any case where it's interesting to delay in that
> scenario? I mean that really can only happen if you changed the
> configuration to a different delay, or your clock offset
> changed. Otherwise we should always reach the consistent point before
> the delay plays a role.  I'm tempted to simply only check for delay when
> consistent.

The argument for having a delay at all is to allow backing up to some
earlier point in the master's history; but a slave that is not yet
consistent cannot provide any rollback/recovery option.  The slave is
completely useless for any purpose until it reaches consistency, so
it might as well do that as fast as possible, and then sit on the
next WAL record until the delay is met.  +1 for no delay at all when
not consistent.

            regards, tom lane
On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:
>>> Depending on the use cases, it may be interesting to have a switch
>>> allowing to not apply the delay should a consistent point not be
>>> reached though...
>
>> Is there actually any case where it's interesting to delay in that
>> scenario? I mean that really can only happen if you changed the
>> configuration to a different delay, or your clock offset
>> changed. Otherwise we should always reach the consistent point before
>> the delay plays a role.  I'm tempted to simply only check for delay when
>> consistent.
>
> The argument for having a delay at all is to allow backing up to some
> earlier point in the master's history; but a slave that is not yet
> consistent cannot provide any rollback/recovery option.  The slave is
> completely useless for any purpose until it reaches consistency, so
> it might as well do that as fast as possible, and then sit on the
> next WAL record until the delay is met.  +1 for no delay at all when
> not consistent.

OK, I don't mind doing so if you guys think that's more adapted. Based
on reading the code, it seems obvious though that this was made so as
a delay is taken into account even before the node is consistent.
Hence ISTM that it would be good as well to get feedback from people
who actually worked on the feature before deciding anything. Robert,
Simon?
--
Michael
I like the idea of replaying up to a consistent point, and then pausing any
new transactions until recovery_min_apply_delay has passed... but maybe my
original problem is OS specific, as I tried PostgreSQL v9.4.5 on CentOS
64-bit the standby database opens just fine... even when
the recovery_min_apply_delay is changed from 1 hour to 24 hours.  I'm
downloading Oracle Linux v6.7 to replicate my previous environment, and
I'll report back with my findings.


On 30 December 2015 at 13:44, Michael Paquier <michael.paquier@gmail.com>
wrote:

> On Wed, Dec 30, 2015 at 10:05 PM, Greg Clough <greg@gclough.com> wrote:
> > I figured that if the database was in a consistent state when it was
> > shutdown with a 1 hour delay, then when it was restarted with a new 24
> hour
> > delay then it would also be consistent and thus available for read-only
> > work.  In my test case, no transactions have occurred but the standby
> still
> > wasn't accessible.
>
> Are you sure about that? What does pg_xlogdump tells you? It seems
> based on the information given upthread that there was at least one
> transaction commit between the redo point and the consistent point.
> When a standby is restarted, it would redo the replay from the last
> same redo point, so that's just logic that the standby keeps being
> unavailable for connections.
>

I didn't run pg_xlogdump... but I did check the timestamps of the most
recently applied transaction, and it does not change before/after the
increase in recovery_min_apply_delay.  This was done on a private test
environment, so there is very little chance I accidentally did a
transaction.  The xlog pointer seem to change, but the replay timestamp is
identical... so what would cause the xlog to increase, but not actually be
a transaction?

I had always put the difference (7015E00 vs. 7015ED8) down to the
shutdown/startup writing something like a "end of file" into the WAL during
the shutdown... but that's purely a guess, as I don't know the internals of
how it really works:

*BEFORE*
Wed Nov 11 15:35:01 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015E00   | 2015-11-11 15:32:53.950376+00 | t
(1 row)

*AFTER*

Wed Nov 11 15:38:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015ED8    | 0/7015ED8   | 2015-11-11 15:32:53.950376+00 | t
(1 row)



.
> > I believe the unavailability of the standby for extended periods if the
> > recovery_min_apply_delay is increased will create some confusion, just
> as it
> > confused me initially.  I can see two schools of thought when the
> parameter
> > is increased:
> >
> > 1. The standby includes transactions that are newer than the
> > "recovery_min_apply_delay" setting, so it must be prevented from exposing
> > any data to ensure applications don't see data too soon.
> > 2. The standby is consistent, so it should be available for read-only
> > queries... but any new WAL should not be applied until the commit time
> is >
> > recovery_min_apply_delay.
>
> If the standby has a transaction commit only after a consistent point
> is reached, you will be good to connect anyway.
>
> > Obviously I fall into camp #2, where I expected the database to basically
> > pause application of WAL until it passed the recovery_min_apply_delay...
> but
> > still be available for read-only queries.
> >
> > If the preferred option is #1, then could we introduce a new error
> message
> > so that it's a bit more communicative.  Maybe something like:
> >
> > psql: FATAL:  the database system has transactions newer than
> > recover_min_apply_delay. Waiting...
>
> Hm. The current error message depends on the state of the database
> reported by the postmaster. It does not seem to me a good idea to
> expose a new state at this level regarding recover_min_apply_delay.
>
> In short, it seems to me that the correct way to address your concerns
> is actually a documentation addition, the point being to mention that
> a hot standby would take a longer time to become available for
> read-only connections, aka allowing read-only connections if there are
> transaction commits that happened between the redo point and the
> consistent point.
> --
> Michael
>

OK, that makes sense.  It appears that these pages would be affected, but
as per my testing above... it doesn't seem to be an issue on CentOS like it
was on Oracle Linux:

http://www.postgresql.org/docs/9.4/static/standby-settings.html
http://www.postgresql.org/docs/9.5/static/standby-settings.html

I'll diagnose it on Oracle Linux vs. CentOS first, and then report back.

Cheers.
Greg.
On Thu, Dec 31, 2015 at 8:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:
>>>> Depending on the use cases, it may be interesting to have a switch
>>>> allowing to not apply the delay should a consistent point not be
>>>> reached though...
>>
>>> Is there actually any case where it's interesting to delay in that
>>> scenario? I mean that really can only happen if you changed the
>>> configuration to a different delay, or your clock offset
>>> changed. Otherwise we should always reach the consistent point before
>>> the delay plays a role.  I'm tempted to simply only check for delay when
>>> consistent.
>>
>> The argument for having a delay at all is to allow backing up to some
>> earlier point in the master's history; but a slave that is not yet
>> consistent cannot provide any rollback/recovery option.  The slave is
>> completely useless for any purpose until it reaches consistency, so
>> it might as well do that as fast as possible, and then sit on the
>> next WAL record until the delay is met.  +1 for no delay at all when
>> not consistent.
>
> OK, I don't mind doing so if you guys think that's more adapted. Based
> on reading the code, it seems obvious though that this was made so as
> a delay is taken into account even before the node is consistent.

Changing my mind after more thoughts on the matter, it seems indeed
that it would make more sense to apply delays only once the database
has reached a consistent state to be able to do immediately
transaction-related operations on a standby without having to wait for
it to reach consistency for perhaps a couple of hours. Please see
attached a patch to do that.
--
Michael

Attachment
On Sat, Jan 2, 2016 at 9:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Changing my mind after more thoughts on the matter, it seems indeed
> that it would make more sense to apply delays only once the database
> has reached a consistent state to be able to do immediately
> transaction-related operations on a standby without having to wait for
> it to reach consistency for perhaps a couple of hours. Please see
> attached a patch to do that.

CF entry added here:
https://commitfest.postgresql.org/9/478/
--
Michael
On 02/01/2016 13:14, Michael Paquier wrote:
> On Thu, Dec 31, 2015 at 8:13 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Andres Freund <andres@anarazel.de> writes:
>>>> On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:
>>>>> Depending on the use cases, it may be interesting to have a switch
>>>>> allowing to not apply the delay should a consistent point not be
>>>>> reached though...
>>>
>>>> Is there actually any case where it's interesting to delay in that
>>>> scenario? I mean that really can only happen if you changed the
>>>> configuration to a different delay, or your clock offset
>>>> changed. Otherwise we should always reach the consistent point before
>>>> the delay plays a role.  I'm tempted to simply only check for delay when
>>>> consistent.
>>>
>>> The argument for having a delay at all is to allow backing up to some
>>> earlier point in the master's history; but a slave that is not yet
>>> consistent cannot provide any rollback/recovery option.  The slave is
>>> completely useless for any purpose until it reaches consistency, so
>>> it might as well do that as fast as possible, and then sit on the
>>> next WAL record until the delay is met.  +1 for no delay at all when
>>> not consistent.
>>
>> OK, I don't mind doing so if you guys think that's more adapted. Based
>> on reading the code, it seems obvious though that this was made so as
>> a delay is taken into account even before the node is consistent.
>
> Changing my mind after more thoughts on the matter, it seems indeed
> that it would make more sense to apply delays only once the database
> has reached a consistent state to be able to do immediately
> transaction-related operations on a standby without having to wait for
> it to reach consistency for perhaps a couple of hours. Please see
> attached a patch to do that.
>
>

I just reviewed the patch. It's pretty straightforward and works as
intended, so I mark it as ready for committer.


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org
On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> I just reviewed the patch. It's pretty straightforward and works as
> intended, so I mark it as ready for committer.

Thanks for the review!
--
Michael
On Thu, Mar 3, 2016 at 9:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> I just reviewed the patch. It's pretty straightforward and works as
>> intended, so I mark it as ready for committer.

The patch looks good to me.

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

Regards,

--
Fujii Masao
On 03/03/2016 15:42, Fujii Masao wrote:
> On Thu, Mar 3, 2016 at 9:26 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>> I just reviewed the patch. It's pretty straightforward and works as
>>> intended, so I mark it as ready for committer.
>
> The patch looks good to me.
>
> We should backpatch this? I'm tempted to do that
> because it seems an oversight.
> It's user-visible change of behavior, though.
>

The user-visible previous situation is to have a standby unavailable for
no good reason, so +1 for backpatching it.

> Regards,
>


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org
On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:
> We should backpatch this? I'm tempted to do that
> because it seems an oversight.
> It's user-visible change of behavior, though.

I agree we should.

Andres
On Fri, Mar 4, 2016 at 2:35 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:
>> We should backpatch this? I'm tempted to do that
>> because it seems an oversight.
>> It's user-visible change of behavior, though.
>
> I agree we should.

We definitely need to backpatch.
--
Michael
On Fri, Mar 4, 2016 at 11:57 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Mar 4, 2016 at 2:35 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:
>>> We should backpatch this? I'm tempted to do that
>>> because it seems an oversight.
>>> It's user-visible change of behavior, though.
>>
>> I agree we should.
>
> We definitely need to backpatch.

Yeah, we should backpatch to 9.4 where the parameter was added.
Pushed. Thanks!

Regards,

--
Fujii Masao