Thread: pg_last_xact_replay_timestamp lies
Hello, dear guru =) help me to fix situation when no change made on primary and standby show lag more than 15 minutes:
master :
postgres=# select * from pg_stat_replication
;
-[ RECORD 1 ]----+------------------------------
pid | 18553
usesysid | 117942
usename | replication
application_name | walreceiver
client_addr | 10.62.43.60
client_hostname |
client_port | 45281
backend_start | 2015-06-15 00:02:00.095195+03
state | streaming
sent_location | 62/BC000000
write_location | 62/BC000000
flush_location | 62/BC000000
replay_location | 62/BC000000
sync_priority | 0
sync_state | async
standby:
postgres=# SELECT (extract(epoch from now() - pg_last_xact_replay_timestamp() ))::int AS slave_lag;
slave_lag
-----------
1030
on master :
postgres=# select name,setting from pg_settings where lower (category) like '%write%log%';
name | setting
------------------------------+-------------------------------------------------------------------
archive_command | cp -i %p /var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null
archive_mode | on
archive_timeout | 600
checkpoint_completion_target | 0.7
checkpoint_segments | 32
checkpoint_timeout | 300
checkpoint_warning | 30
commit_delay | 0
commit_siblings | 5
fsync | on
full_page_writes | on
synchronous_commit | on
wal_buffers | 2048
wal_level | hot_standby
wal_sync_method | fdatasync
wal_writer_delay | 200
(16 rows)
master :
postgres=# select * from pg_stat_replication
;
-[ RECORD 1 ]----+------------------------------
pid | 18553
usesysid | 117942
usename | replication
application_name | walreceiver
client_addr | 10.62.43.60
client_hostname |
client_port | 45281
backend_start | 2015-06-15 00:02:00.095195+03
state | streaming
sent_location | 62/BC000000
write_location | 62/BC000000
flush_location | 62/BC000000
replay_location | 62/BC000000
sync_priority | 0
sync_state | async
standby:
postgres=# SELECT (extract(epoch from now() - pg_last_xact_replay_timestamp() ))::int AS slave_lag;
slave_lag
-----------
1030
on master :
postgres=# select name,setting from pg_settings where lower (category) like '%write%log%';
name | setting
------------------------------+-------------------------------------------------------------------
archive_command | cp -i %p /var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null
archive_mode | on
archive_timeout | 600
checkpoint_completion_target | 0.7
checkpoint_segments | 32
checkpoint_timeout | 300
checkpoint_warning | 30
commit_delay | 0
commit_siblings | 5
fsync | on
full_page_writes | on
synchronous_commit | on
wal_buffers | 2048
wal_level | hot_standby
wal_sync_method | fdatasync
wal_writer_delay | 200
(16 rows)
On Mon, Jun 15, 2015 at 8:30 AM, Anton Bushmelev <djeday84@gmail.com> wrote: > Hello, dear guru =) help me to fix situation when no change made on primary > and standby show lag more than 15 minutes: > master : > postgres=# select * from pg_stat_replication > ; > -[ RECORD 1 ]----+------------------------------ > pid | 18553 > usesysid | 117942 > usename | replication > application_name | walreceiver > client_addr | 10.62.43.60 > client_hostname | > client_port | 45281 > backend_start | 2015-06-15 00:02:00.095195+03 > state | streaming > sent_location | 62/BC000000 > write_location | 62/BC000000 > flush_location | 62/BC000000 > replay_location | 62/BC000000 > sync_priority | 0 > sync_state | async > > standby: > postgres=# SELECT (extract(epoch from now() - > pg_last_xact_replay_timestamp() ))::int AS slave_lag; > slave_lag > ----------- > 1030 > > on master : > postgres=# select name,setting from pg_settings where lower (category) like > '%write%log%'; > name | setting > ------------------------------+------------------------------------------------------------------- > archive_command | cp -i %p > /var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null > archive_mode | on > archive_timeout | 600 > checkpoint_completion_target | 0.7 > checkpoint_segments | 32 > checkpoint_timeout | 300 > checkpoint_warning | 30 > commit_delay | 0 > commit_siblings | 5 > fsync | on > full_page_writes | on > synchronous_commit | on > wal_buffers | 2048 > wal_level | hot_standby > wal_sync_method | fdatasync > wal_writer_delay | 200 > (16 rows) Isn't your mistake the fact that you rely on the assumption that replication lag measured in terms of timestamp is a good thing while it should be estimated in terms of byte difference by comparing WAL positions between the master and its standbys? -- Michael
Hello, thank t for response, measure in bytes may bemore correct, but to bring it to the customer? :) I think it is easier to say that the standby database lags behind master no more than 15 minutes, than the fact that it differs for 1 megabyte. ps: sorry for my English On 06/15/2015 02:57 AM, Michael Paquier wrote: > Isn't your mistake the fact that you rely on the assumption that > replication lag measured in terms of timestamp is a good thing while > it should be estimated in terms of byte difference by comparing WAL > positions between the master and its standbys?
On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote: > Hello, thank t for response, measure in bytes may bemore correct, but to > bring it to the customer? :) I think it is easier to say that the standby > database lags behind master no more than 15 minutes, than the fact that it > differs for 1 megabyte. > ps: sorry for my English > > > On 06/15/2015 02:57 AM, Michael Paquier wrote: >> >> Isn't your mistake the fact that you rely on the assumption that >> replication lag measured in terms of timestamp is a good thing while >> it should be estimated in terms of byte difference by comparing WAL >> positions between the master and its standbys? Comparing pg_last_xact_replay_timestamp() with now() to measure replication lag makes little sense: this function shows the timestamp of the *last transaction replayed* during recovery (see here: http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL ). Hence if your master server has no activity for a certain amount of time, meaning that no transactions could be replayed on the standby, this will continuously increase. -- Michael
use this query: SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replication_lag; found at http://www.postgresql.org/message-id/CADKbJJWz9M0swPT3oqe8f9+tfD4-F54uE6Xtkh4nERpVsQnjnw@mail.gmail.com On 06/15/2015 04:24 AM, Michael Paquier wrote: > On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote: >> Hello, thank t for response, measure in bytes may bemore correct, but to >> bring it to the customer? :) I think it is easier to say that the standby >> database lags behind master no more than 15 minutes, than the fact that it >> differs for 1 megabyte. >> ps: sorry for my English >> >> >> On 06/15/2015 02:57 AM, Michael Paquier wrote: >>> Isn't your mistake the fact that you rely on the assumption that >>> replication lag measured in terms of timestamp is a good thing while >>> it should be estimated in terms of byte difference by comparing WAL >>> positions between the master and its standbys? > Comparing pg_last_xact_replay_timestamp() with now() to measure > replication lag makes little sense: this function shows the timestamp > of the *last transaction replayed* during recovery (see here: > http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL > ). Hence if your master server has no activity for a certain amount of > time, meaning that no transactions could be replayed on the standby, > this will continuously increase.