Thread: Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5

I am seeing the situation where the reported flush location for the sync
standby (standby1 below) is *behind* the reported current xlog location
of the primary. This is Postgres 9.1.5 , and I was under the impression
that transactions initiated on the master do not commit until the
corresponding wal is flushed on the sync standby.

Now the standby is definitely working in sync mode, because stopping it
halts all write transactions on the primary (sync_standby_names contains
only standby1). So is the reported lag in flush location merely an
artifact of timing in the query, or is there something else going on? [1]

db=# SELECT

application_name,pg_current_xlog_location(),sent_location,write_location,flush_location,replay_location,sync_priority,state
      FROM pg_stat_replication where replay_location is not null;
  application_name | pg_current_xlog_location | sent_location |
write_location | flush_location | replay_location | sync_priority |   state

------------------+--------------------------+---------------+----------------+----------------+-----------------+---------------+-----------
  standby1         | E/254909E0               | E/25490000    |
E/2548C3B8     | E/2548C3B8     | E/25476DE0 |             1 |
streaming   <===
  standby2         | E/254909E0               | E/2548C3B8    |
E/25476DE0     | E/25476DE0     | E/254724C0 |             0 | streaming
  standby3         | E/254909E0               | E/254909E0    |
E/25476DE0     | E/25476DE0     | E/254724C0 |             0 | streaming
  standby4         | E/254909E0               | E/25490000    |
E/2548C3B8     | E/25476DE0     | E/25476DE0 |             0 | streaming
  standby5         | E/254909E0               | E/25490000    |
E/25476DE0     | E/25476DE0     | E/254724C0 |             0 | streaming
(5 rows)


Cheers

Mark

[1] Looking at the code for pg_stat_replication, it appears to take the
sync rep lock while reporting, so in theory should be exactly right...I
should perhaps check what pg_current_xlog_location does...
On 4 October 2012 05:32, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
> I am seeing the situation where the reported flush location for the sync
> standby (standby1 below) is *behind* the reported current xlog location of
> the primary. This is Postgres 9.1.5 , and I was under the impression that
> transactions initiated on the master do not commit until the corresponding
> wal is flushed on the sync standby.
>
> Now the standby is definitely working in sync mode, because stopping it
> halts all write transactions on the primary (sync_standby_names contains
> only standby1). So is the reported lag in flush location merely an artifact
> of timing in the query, or is there something else going on? [1]

The writing of new WAL is independent of the wait that occurs on
commit, so it is entirely possible, even desirable, that the observed
effect occurs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On 04/10/12 19:06, Simon Riggs wrote:
> On 4 October 2012 05:32, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
>> I am seeing the situation where the reported flush location for the sync
>> standby (standby1 below) is *behind* the reported current xlog location of
>> the primary. This is Postgres 9.1.5 , and I was under the impression that
>> transactions initiated on the master do not commit until the corresponding
>> wal is flushed on the sync standby.
>>
>> Now the standby is definitely working in sync mode, because stopping it
>> halts all write transactions on the primary (sync_standby_names contains
>> only standby1). So is the reported lag in flush location merely an artifact
>> of timing in the query, or is there something else going on? [1]
>
> The writing of new WAL is independent of the wait that occurs on
> commit, so it is entirely possible, even desirable, that the observed
> effect occurs.
>

Ah right - it did occur to me (after posting of course), that *other*
non commit wal could be causing the effect... thank you for clarifying!

This could be worth mentioning in docs for the view - as the context
I've encountered this effect is folks writing scripts for replication
lag etc.

Cheers

Mark