Thread: Logical replication and wal segment retention

Logical replication and wal segment retention

From

John Scalia

Date:

27 February 2019, 15:40:26

Hello, folks,

Yesterday, I had a small file system fill up, due to some logical replication testing we had been performing. We had
beentesting IBM’s IIDR system and apparently it had built a logical replication slot on my server. When the test was
completed,nobody removed the slot, so WAL segments stopped being dropped. Now I can understand the difficulty
separatingwhat physical versus logical replication needs from the WAL segments, but as logical replication is database
specificnot cluster wide, this behavior was a little unexpected, since the WAL segments are cluster wide. Are WAL
segmentsgoing to pile up whenever something drops a logical replication connection? I’ve seen it, but it seems like
thiscould be a bad thing. 
-
Jay

Sent from my iPhone

Re: Logical replication and wal segment retention

From

Johannes Truschnigg

Date:

27 February 2019, 15:52:42

Hi Jay,

On Wed, Feb 27, 2019 at 07:40:26AM -0500, John Scalia wrote:
> Hello, folks,
>
> Yesterday, I had a small file system fill up, due to some logical
> replication testing we had been performing. We had been testing IBM’s IIDR
> system and apparently it had built a logical replication slot on my server.
> When the test was completed, nobody removed the slot, so WAL segments
> stopped being dropped. Now I can understand the difficulty separating what
> physical versus logical replication needs from the WAL segments, but as
> logical replication is database specific not cluster wide, this behavior was
> a little unexpected, since the WAL segments are cluster wide. Are WAL
> segments going to pile up whenever something drops a logical replication
> connection? I’ve seen it, but it seems like this could be a bad thing.

Since Logical Replication is piggybacked on Physical Replication, you cannot
use the first without having the latter. And yes, what you experienced is one
of the dangers of using replication slots when having a busy database (i.e.
producing lots of WAL) and a filesystem with little excess space. Under these
circumstances, it is imperative to monitor for (and alert on) anything going
awry with your replication slot consumers, and/or the size of your wal/xlog
directory. It's a feature of replication slots to work that way - but one that
may end up biting you.

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Attachment

signature.asc

Re: Logical replication and wal segment retention

From

Achilleas Mantzios

Date:

27 February 2019, 15:57:46

On 27/2/19 2:52 μ.μ., Johannes Truschnigg wrote:
> Hi Jay,
>
> On Wed, Feb 27, 2019 at 07:40:26AM -0500, John Scalia wrote:
>> Hello, folks,
>>
>> Yesterday, I had a small file system fill up, due to some logical
>> replication testing we had been performing. We had been testing IBM’s IIDR
>> system and apparently it had built a logical replication slot on my server.
>> When the test was completed, nobody removed the slot, so WAL segments
>> stopped being dropped. Now I can understand the difficulty separating what
>> physical versus logical replication needs from the WAL segments, but as
>> logical replication is database specific not cluster wide, this behavior was
>> a little unexpected, since the WAL segments are cluster wide. Are WAL
>> segments going to pile up whenever something drops a logical replication
>> connection? I’ve seen it, but it seems like this could be a bad thing.
> Since Logical Replication is piggybacked on Physical Replication, you cannot
> use the first without having the latter. And yes, what you experienced is one
> of the dangers of using replication slots when having a busy database (i.e.
> producing lots of WAL) and a filesystem with little excess space. Under these
> circumstances, it is imperative to monitor for (and alert on) anything going
> awry with your replication slot consumers, and/or the size of your wal/xlog
> directory. It's a feature of replication slots to work that way - but one that
> may end up biting you.
A logical approach for replication slots would be to accept a parameter regarding max WAL files to retain, after which
newerWALs will be removed and the primary server saved.  Pretty much like : 
 
--archive-push-queue-max argument of pgbackrest .
>


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Logical replication and wal segment retention

From

John Scalia

Date:

27 February 2019, 16:29:09

I thought as much. The basic problem, however, is that I never created the logical slots. The IIDR application did that
allby itself, and after it terminated, it did not bother to remove the slots. So, my disk monitor threw up when the WAL
filesystem began to fill up. I was trying then to figure out why it did that. 

Sent from my iPhone

> On Feb 27, 2019, at 7:57 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
>> On 27/2/19 2:52 μ.μ., Johannes Truschnigg wrote:
>> Hi Jay,
>>
>>> On Wed, Feb 27, 2019 at 07:40:26AM -0500, John Scalia wrote:
>>> Hello, folks,
>>>
>>> Yesterday, I had a small file system fill up, due to some logical
>>> replication testing we had been performing. We had been testing IBM’s IIDR
>>> system and apparently it had built a logical replication slot on my server.
>>> When the test was completed, nobody removed the slot, so WAL segments
>>> stopped being dropped. Now I can understand the difficulty separating what
>>> physical versus logical replication needs from the WAL segments, but as
>>> logical replication is database specific not cluster wide, this behavior was
>>> a little unexpected, since the WAL segments are cluster wide. Are WAL
>>> segments going to pile up whenever something drops a logical replication
>>> connection? I’ve seen it, but it seems like this could be a bad thing.
>> Since Logical Replication is piggybacked on Physical Replication, you cannot
>> use the first without having the latter. And yes, what you experienced is one
>> of the dangers of using replication slots when having a busy database (i.e.
>> producing lots of WAL) and a filesystem with little excess space. Under these
>> circumstances, it is imperative to monitor for (and alert on) anything going
>> awry with your replication slot consumers, and/or the size of your wal/xlog
>> directory. It's a feature of replication slots to work that way - but one that
>> may end up biting you.
> A logical approach for replication slots would be to accept a parameter regarding max WAL files to retain, after
whichnewer WALs will be removed and the primary server saved.  Pretty much like : --archive-push-queue-max argument of
pgbackrest. 
>>
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>

Re: Logical replication and wal segment retention

From

Johannes Truschnigg

Date:

27 February 2019, 16:33:54

On Wed, Feb 27, 2019 at 02:57:46PM +0200, Achilleas Mantzios wrote:
> > [...]
> A logical approach for replication slots would be to accept a parameter
> regarding max WAL files to retain, after which newer WALs will be removed
> and the primary server saved.  Pretty much like : --archive-push-queue-max
> argument of pgbackrest .

Before replication slots where a thing, you had to carefully balance
wal_keep_segments in regard to WAL production or/and (usually and :)) set up a
proper WAL archive for replication to be able to soldier on even after a WAL
receiver experienced service-interrupting trouble for a while.

The benefit of that was that the WAL producer remained unaffected (unless you
bungled the archiving process profoundly) of such calamities. To me, that was
the preferred trade-off for all use-cases of replication that I personally
encountered.

If it were possible to have the best of both worlds (i.e. have a kind of "high
water mark number of WAL segments"-setting per replication slot, over which
the slot would be abandoned - with a heavy heart and lots of screaming in the
logs, of course - by the producer), that sure would be awesome. But at this
time, we are where we are :)

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Attachment

signature.asc

Re: Logical replication and wal segment retention

From

Johannes Truschnigg

Date:

27 February 2019, 16:38:58

On Wed, Feb 27, 2019 at 08:29:09AM -0500, John Scalia wrote:
> I thought as much. The basic problem, however, is that I never created the
> logical slots. The IIDR application did that all by itself, and after it
> terminated, it did not bother to remove the slots. So, my disk monitor threw
> up when the WAL file system began to fill up. I was trying then to figure
> out why it did that.

I don't know the particular product that made you experience these troubles,
but it could be on purpose (if it relies on consuming the WAL continuously,
like a proper streaming replication slave/secondary would, and expectes to be
able to continue working where it left off before terminating) - or it could
be a rather dangerous usability hurdle that should, at the very least, be
clearly documented.

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Attachment

signature.asc

Re: Logical replication and wal segment retention

From

Shreeyansh Dba

Date:

27 February 2019, 16:49:37

When using replication slots with standby nodes, a master node retains the necessary WAL files in pg_xlog until the standby has received them at the cost of monitoring the space used by WAL files in pg_xlog as now the disk space that those filesuse is not strictly controlled by wal_keep_segments or checkpoint_segments but by elements (perhaps) external to the server where the master node is running.

In the case of a standby node using streaming replication, the server does not actually wait for the slave to catch up if it disconnects and simply deletes the WAL files that are not needed. This has the advantage to facilitate management of the disk space used by WAL files: use checkpoint_segments as well in this case. The amount of WAL to keep on master side can as well be tuned with wal_keep_segments.

Thanks & Regards,
Shreeyansh DBA Team
www.shreeyansh.com

On Wed, Feb 27, 2019 at 6:10 PM John Scalia <jayknowsunix@gmail.com> wrote:

Hello, folks,

Yesterday, I had a small file system fill up, due to some logical replication testing we had been performing. We had been testing IBM’s IIDR system and apparently it had built a logical replication slot on my server. When the test was completed, nobody removed the slot, so WAL segments stopped being dropped. Now I can understand the difficulty separating what physical versus logical replication needs from the WAL segments, but as logical replication is database specific not cluster wide, this behavior was a little unexpected, since the WAL segments are cluster wide. Are WAL segments going to pile up whenever something drops a logical replication connection? I’ve seen it, but it seems like this could be a bad thing.
-
Jay

Sent from my iPhone