BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while - Mailing list pgsql-bugs

From postgresql.org@gclough.com
Subject BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while
Date
Msg-id 20151111155006.2644.84564@wrigleys.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13770
Logged by:          Greg Clough
Email address:      postgresql.org@gclough.com
PostgreSQL version: 9.4.5
Operating system:   Oracle Linux v6.7
Description:

If you extend the recovery_min_apply_delay on a standby and then restart it,
you cannot connect to it for read-only transactions until the most recent
transaction time >= delay time.  This is problematic if you extend from say
1 hour to 24 hours, as you have to wait almost a day before you can connect
to your Standby again.

I can't find anything in the documentation to describe this as intended
behaviour, and my expectation was that when I extended the delay that the
Standby database would come up and be available for read-only transactions,
but would simply delay any new transactions until they were older than
recovery_min_apply_delay.

I'm curious if this was a design decision to stop people getting confused
when they access a database before recovery_min_apply_delay has expired and
get data that's too recent, or if it was an unintentional consequence of the
way it was implemented.


Environment:

Oracle Linux v6.7
PostgreSQL v9.4.5 (Installed via yum)

postgres@gclough[5432] ~$ uname -a
Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
2015 x86_64 x86_64 x86_64 GNU/Linux

postgres@gclough[5432] ~$ psql --version
psql (PostgreSQL) 9.4.5


Test Case:

1. Create a Primary database on port 5432

export PGDATA=/u01/postgres/9.4/5432/data
export PGPORT=5432
initdb

wal_level = hot_standby                 # minimal, archive, hot_standby, or
logical
max_wal_senders = 1             # max number of walsender processes

2. Replicate it to a Standby database on port 5433

export PGDATA=/u01/postgres/9.4/5433/data
export PGPORT=5433

port = 5433                             # (change requires restart)
wal_level = hot_standby                 # minimal, archive, hot_standby, or
logical
max_wal_senders = 1             # max number of walsender processes
hot_standby = on                        # "on" allows queries during
recovery

3. On the Standby (Port 5433), introduce a 2min time delay with
"recovery_min_apply_delay" in the recovery.conf:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=localhost user=postgres
application_name=port_5433'
recovery_min_apply_delay = 2min

4. Check the current lag on the Standby (Port 5433):

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:32:32 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7000000    | 0/7000000   |                               | t
(1 row)

5. Create a small test table on the Primary (Port 5432):

CREATE TABLE delay_test (id NUMERIC);
INSERT INTO delay_test VALUES (1);

6. Check the application of WAL on the Standby (Port 5433), and note the
delay:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:33:06 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015DC8    | 0/7015890   |                               | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:34:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015890   |                               | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:01 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015E00    | 0/7015E00   | 2015-11-11 15:32:53.950376+00 | t
(1 row)


7. Reset the time delay on the Standby from 2 to 5 minutes

recovery_min_apply_delay = 5min

pg_ctl restart -m fast

8. Try to connect to the Standby, and get a failure:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:26 GMT 2015
psql: FATAL:  the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:36:01 GMT 2015
psql: FATAL:  the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:37:00 GMT 2015
psql: FATAL:  the database system is starting up

9. Wait 3 minutes... then it works:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:38:00 GMT 2015
 receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
 0/7015ED8    | 0/7015ED8   | 2015-11-11 15:32:53.950376+00 | t
(1 row)

pgsql-bugs by date:

Previous
From: jhirsch
Date:
Subject: Re: BUG #13622: unsafe use of relative rpath libpq.5.dylib
Next
From: awasarax@yandex.ru
Date:
Subject: BUG #13768: JSONB concat