Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Restricting maximum keep segments by repslots
Date
Msg-id 20170907.215956.110216588.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hello,

At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170907.141212.227032666.horiguchi.kyotaro@lab.ntt.co.jp>
> > I would like a flag in pg_replication_slots, and possibly also a
> > numerical column that indicates how far away from the critical point
> > each slot is.  That would be great for a monitoring system.
> 
> Great! I'll do that right now.

Done.

In the attached patch on top of the previous patch, I added two
columns in pg_replication_slots, "live" and "distance". The first
indicates the slot will "live" after the next checkpoint. The
second shows the how many bytes checkpoint lsn can advance before
the slot will "die", or how many bytes the slot have lost after
"death".


Setting wal_keep_segments = 1 and max_slot_wal_keep_size = 16MB.

=# select slot_name, restart_lsn, pg_current_wal_lsn(), live, distance from pg_replication_slots;

slot_name | restart_lsn | pg_current_wal_lsn | live | distance  
-----------+-------------+--------------------+------+-----------s1        | 0/162D388   | 0/162D3C0          | t    |
0/29D2CE8

This shows that checkpoint can advance 0x29d2ce8 bytes before the
slot will die even if the connection stalls.
s1        | 0/4001180   | 0/6FFF2B8          | t    | 0/DB8

Just before the slot loses sync.
s1        | 0/4001180   | 0/70008A8          | f    | 0/FFEE80

The checkpoint after this removes some required segments.

2017-09-07 19:04:07.677 JST [13720] WARNING:  restart LSN of replication slots is ignored by checkpoint
2017-09-07 19:04:07.677 JST [13720] DETAIL:  Some replication slots have lost required WAL segnents to continue by up
to1 segments.
 

If max_slot_wal_keep_size if not set (0), live is always true and
distance is NULL.

slot_name | restart_lsn | pg_current_wal_lsn | live | distance  
-----------+-------------+--------------------+------+-----------s1        | 0/4001180   | 0/73117A8          | t    |




- The name (or its content) of the new columns should be arguable.

- pg_replication_slots view takes LWLock on ControlFile and spinlock on XLogCtl for every slot. But seems difficult to
reduceit..
 

- distance seems mitakenly becomes 0/0 for certain condition..

- The result seems almost right but more precise check needed. (Anyway it cannot be perfectly exact.);

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Adding support for Default partition in partitioning
Next
From: Alexey Chernyshov
Date:
Subject: Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans