Thread: pg_last_xact_replay_timestamp lies

pg_last_xact_replay_timestamp lies

From

Anton Bushmelev

Date:

15 June 2015, 02:31:04

Hello, dear guru =) help me to fix situation when no change made on primary and standby show lag more than 15 minutes:
master :
postgres=# select * from pg_stat_replication ;-[ RECORD 1 ]----+------------------------------pid              | 18553usesysid         | 117942usename          | replicationapplication_name | walreceiverclient_addr      | 10.62.43.60client_hostname | client_port      | 45281backend_start    | 2015-06-15 00:02:00.095195+03state            | streamingsent_location    | 62/BC000000write_location   | 62/BC000000flush_location   | 62/BC000000replay_location | 62/BC000000sync_priority    | 0
sync_state       | async

standby:
postgres=# SELECT (extract(epoch from now() - pg_last_xact_replay_timestamp() ))::int AS slave_lag; slave_lag -----------      1030
on master :
postgres=# select name,setting from pg_settings where lower (category) like '%write%log%';             name             |                              setting                              ------------------------------+------------------------------------------------------------------- archive_command              | cp -i %p /var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null archive_mode                 | on archive_timeout              | 600 checkpoint_completion_target | 0.7 checkpoint_segments          | 32 checkpoint_timeout           | 300 checkpoint_warning           | 30 commit_delay                 | 0 commit_siblings              | 5 fsync                        | on full_page_writes             | on synchronous_commit           | on wal_buffers                  | 2048 wal_level                    | hot_standby wal_sync_method              | fdatasync wal_writer_delay             | 200(16 rows)

Re: pg_last_xact_replay_timestamp lies

From

Michael Paquier

Date:

15 June 2015, 02:57:43

On Mon, Jun 15, 2015 at 8:30 AM, Anton Bushmelev <djeday84@gmail.com> wrote:
> Hello, dear guru =) help me to fix situation when no change made on primary
> and standby show lag more than 15 minutes:
> master :
> postgres=# select * from pg_stat_replication
> ;
> -[ RECORD 1 ]----+------------------------------
> pid              | 18553
> usesysid         | 117942
> usename          | replication
> application_name | walreceiver
> client_addr      | 10.62.43.60
> client_hostname  |
> client_port      | 45281
> backend_start    | 2015-06-15 00:02:00.095195+03
> state            | streaming
> sent_location    | 62/BC000000
> write_location   | 62/BC000000
> flush_location   | 62/BC000000
> replay_location  | 62/BC000000
> sync_priority    | 0
> sync_state       | async
>
> standby:
> postgres=# SELECT (extract(epoch from now() -
> pg_last_xact_replay_timestamp() ))::int AS slave_lag;
>  slave_lag
> -----------
>       1030
>
> on master :
> postgres=# select name,setting from pg_settings where lower (category) like
> '%write%log%';
>              name             |                              setting
> ------------------------------+-------------------------------------------------------------------
>  archive_command              | cp -i %p
> /var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null
>  archive_mode                 | on
>  archive_timeout              | 600
>  checkpoint_completion_target | 0.7
>  checkpoint_segments          | 32
>  checkpoint_timeout           | 300
>  checkpoint_warning           | 30
>  commit_delay                 | 0
>  commit_siblings              | 5
>  fsync                        | on
>  full_page_writes             | on
>  synchronous_commit           | on
>  wal_buffers                  | 2048
>  wal_level                    | hot_standby
>  wal_sync_method              | fdatasync
>  wal_writer_delay             | 200
> (16 rows)

Isn't your mistake the fact that you rely on the assumption that
replication lag measured in terms of timestamp is a good thing while
it should be estimated in terms of byte difference by comparing WAL
positions between the master and its standbys?
--
Michael

Re: pg_last_xact_replay_timestamp lies

From

Anton Bushmelev

Date:

15 June 2015, 03:05:09

Hello, thank t for response, measure in bytes may bemore correct, but to
bring it to the customer? :) I think it is easier to say that the
standby database lags behind master no more than 15 minutes, than the
fact that it differs for 1 megabyte.
ps: sorry for my English

On 06/15/2015 02:57 AM, Michael Paquier wrote:
> Isn't your mistake the fact that you rely on the assumption that
> replication lag measured in terms of timestamp is a good thing while
> it should be estimated in terms of byte difference by comparing WAL
> positions between the master and its standbys?

Re: pg_last_xact_replay_timestamp lies

From

Michael Paquier

Date:

15 June 2015, 04:24:48

On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote:
> Hello, thank t for response, measure in bytes may bemore correct, but to
> bring it to the customer? :) I think it is easier to say that the standby
> database lags behind master no more than 15 minutes, than the fact that it
> differs for 1 megabyte.
> ps: sorry for my English
>
>
> On 06/15/2015 02:57 AM, Michael Paquier wrote:
>>
>> Isn't your mistake the fact that you rely on the assumption that
>> replication lag measured in terms of timestamp is a good thing while
>> it should be estimated in terms of byte difference by comparing WAL
>> positions between the master and its standbys?

Comparing pg_last_xact_replay_timestamp() with now() to measure
replication lag makes little sense: this function shows the timestamp
of the *last transaction replayed* during recovery (see here:
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL
). Hence if your master server has no activity for a certain amount of
time, meaning that no transactions could be replayed on the standby,
this will continuously increase.
--
Michael

Re: pg_last_xact_replay_timestamp lies

From

Anton

Date:

15 June 2015, 18:06:19

use this query:
SELECT
CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
  ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER
END
AS replication_lag;

found at
http://www.postgresql.org/message-id/CADKbJJWz9M0swPT3oqe8f9+tfD4-F54uE6Xtkh4nERpVsQnjnw@mail.gmail.com


On 06/15/2015 04:24 AM, Michael Paquier wrote:
> On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote:
>> Hello, thank t for response, measure in bytes may bemore correct, but to
>> bring it to the customer? :) I think it is easier to say that the standby
>> database lags behind master no more than 15 minutes, than the fact that it
>> differs for 1 megabyte.
>> ps: sorry for my English
>>
>>
>> On 06/15/2015 02:57 AM, Michael Paquier wrote:
>>> Isn't your mistake the fact that you rely on the assumption that
>>> replication lag measured in terms of timestamp is a good thing while
>>> it should be estimated in terms of byte difference by comparing WAL
>>> positions between the master and its standbys?
> Comparing pg_last_xact_replay_timestamp() with now() to measure
> replication lag makes little sense: this function shows the timestamp
> of the *last transaction replayed* during recovery (see here:
> http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL
> ). Hence if your master server has no activity for a certain amount of
> time, meaning that no transactions could be replayed on the standby,
> this will continuously increase.