On 09/14/2014 07:03 AM, Torsten Förtsch wrote:
> Hi,
>
> I noticed a strange behaviour regarding pg_stat_replication in 9.3. If
> called from psql using the \watch command, I see all my replicas. From
> time to time one of them drops out and reconnects in a short period of
> time, typically ~30 sec.
>
> If I use the same select in plpgsql like:
>
> FOR r in SELECT application_name,
> client_addr,
> flush_location, clock_timestamp() AS lmd
> FROM pg_stat_replication
> ORDER BY application_name, client_addr
> LOOP
> RAISE NOTICE 'aname=%, ca=%, lmd=%, loc=%, cur=%, lag=%',
> r.application_name, r.client_addr, r.lmd,
> r.flush_location,
> pg_current_xlog_location(),
> pg_size_pretty(
> pg_xlog_location_diff(
> pg_current_xlog_location(),
> r.flush_location
> )
> );
> END LOOP;
>
> I see one of the replicas dropping out but never coming back again while
> in a parallel session using psql and \watch it indeed does come back.
>
> Is that intended?
>
> Torsten
>
>
I wonder if its a transaction thing? Maybe \watch is using a transaction for each (or isn't using transactions at
all),whereas the plpgsql is one long transaction?
Also if one of your replicas is far away, it doesn't really surprise me that it might loose connection every once and a
while. On the other hand, if the box is on the same subnet, right next to the master, and it was loosing connection,
thatwould be a bad thing.
So, how far away is the replica? And does 'ps ax|grep postgr' show 'idle' or 'idle in transaction' on the \watch and
theplpgsql?
-Andy