Thread: physical slot xmin dependency on logical slot?

physical slot xmin dependency on logical slot?

From
Jeremy Finzel
Date:
We had a scenario today that was new to us.  We had a logical replication slot that was severely far behind.  Before dropping this logical slot, we made a physical point-in-time-recovery snapshot of the system with this logical slot.

This logical slot was causing severe catalog bloat.  We proceeded to drop the logical slot which was over 12000 WAL segments behind.  The physical slot was only a few 100 segments behind and still in place.

But now proceeding to VAC FULL the catalog tables did not recover any bloat beyond the now-dropped logical slot.  Eventually to our surprise, we found that dropping the physical slot allowed us to recover the bloat.

We saw in forensics after the fact that xmin of the physical slot equaled the catalog_xmin of the logical slot.  Is there some dependency here where physical slots made of a system retain all transactions of logical slots it contains as well?  If so, could someone help us understand this, and is there documentation around this?  Is this by design?

We had thought that the physical slot would only retain the WAL it needed for its own restart_lsn, not the segments needed by only logical slots as well.  Any explanation would be much appreciated!

Thanks,
Jeremy

Re: physical slot xmin dependency on logical slot?

From
Andres Freund
Date:
Hi,

On 2019-11-18 15:36:47 -0600, Jeremy Finzel wrote:
> We had a scenario today that was new to us.  We had a logical replication
> slot that was severely far behind.  Before dropping this logical slot, we
> made a physical point-in-time-recovery snapshot of the system with this
> logical slot.

> This logical slot was causing severe catalog bloat.  We proceeded to drop
> the logical slot which was over 12000 WAL segments behind.  The physical
> slot was only a few 100 segments behind and still in place.
> 
> But now proceeding to VAC FULL the catalog tables did not recover any bloat
> beyond the now-dropped logical slot.  Eventually to our surprise, we found
> that dropping the physical slot allowed us to recover the bloat.
> 
> We saw in forensics after the fact that xmin of the physical slot equaled
> the catalog_xmin of the logical slot.  Is there some dependency here where
> physical slots made of a system retain all transactions of logical slots it
> contains as well?  If so, could someone help us understand this, and is
> there documentation around this?  Is this by design?
> 
> We had thought that the physical slot would only retain the WAL it needed
> for its own restart_lsn, not the segments needed by only logical slots as
> well.  Any explanation would be much appreciated!

The logical slot on the standby affects hot_standby_feedback, which in
turn means that the physical slot also transports xmin horizons to the
primary.

Note that our docs suggest to drop slots when cloning a node (and
pg_basebackup/basebackup on the server side do so automatically):
   <para>
    It is often a good idea to also omit from the backup the files
    within the cluster's <filename>pg_replslot/</filename> directory, so that
    replication slots that exist on the master do not become part of the
    backup.  Otherwise, the subsequent use of the backup to create a standby
    may result in indefinite retention of WAL files on the standby, and
    possibly bloat on the master if hot standby feedback is enabled, because
    the clients that are using those replication slots will still be connecting
    to and updating the slots on the master, not the standby.  Even if the
    backup is only intended for use in creating a new master, copying the
    replication slots isn't expected to be particularly useful, since the
    contents of those slots will likely be badly out of date by the time
    the new master comes on line.
   </para>

It's generally useful to look at pg_stat_replication for these kinds of
things...

Greetings,

Andres Freund



Re: physical slot xmin dependency on logical slot?

From
Craig Ringer
Date:
On Tue, 19 Nov 2019 at 05:37, Jeremy Finzel <finzelj@gmail.com> wrote:
We had a scenario today that was new to us.  We had a logical replication slot that was severely far behind.  Before dropping this logical slot, we made a physical point-in-time-recovery snapshot of the system with this logical slot.

This logical slot was causing severe catalog bloat.  We proceeded to drop the logical slot which was over 12000 WAL segments behind.  The physical slot was only a few 100 segments behind and still in place.

But now proceeding to VAC FULL the catalog tables did not recover any bloat beyond the now-dropped logical slot.  Eventually to our surprise, we found that dropping the physical slot allowed us to recover the bloat.

We saw in forensics after the fact that xmin of the physical slot equaled the catalog_xmin of the logical slot.  Is there some dependency here where physical slots made of a system retain all transactions of logical slots it contains as well?  If so, could someone help us understand this, and is there documentation around this?  Is this by design?

I expect that you created the replica in a manner that preserved the logical replication slot on it. You also had hot_standby_feedback enabled.

PostgreSQL standbys send the global xmin and (in Pg10+) catalog_xmin to the upstream when hot_standby_feedback is enabled. If there's a slot holding the catalog_xmin on the replica down, that'll be passed on via hot_standby_feedback to the upstream. On Pg 9.6 or older, or if the replica isn't using a physical replication slot, the catalog_xmin is treated as a regular xmin since there's nowhere in PGPROC or PGXACT to track the separate catalog_xmin. If the standby uses a physical slot, then on pg10+ the catalog_xmin sent by the replica is stored as the catalog_xmin on the physical slot instead.

Either way, if you have hot_standby_feedback enabled on a standby, that feedback includes the requirements of any replication slots on the standby.
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: physical slot xmin dependency on logical slot?

From
Jeremy Finzel
Date:
I expect that you created the replica in a manner that preserved the logical replication slot on it. You also had hot_standby_feedback enabled.

As per both you and Andres' replies, we wanted the backup to have the logical slots on it, because we wanted to allow decoding from the slots on our backup.  However, what we should have done is drop the slot of the backup on the master.
 
PostgreSQL standbys send the global xmin and (in Pg10+) catalog_xmin to the upstream when hot_standby_feedback is enabled. If there's a slot holding the catalog_xmin on the replica down, that'll be passed on via hot_standby_feedback to the upstream. On Pg 9.6 or older, or if the replica isn't using a physical replication slot, the catalog_xmin is treated as a regular xmin since there's nowhere in PGPROC or PGXACT to track the separate catalog_xmin. If the standby uses a physical slot, then on pg10+ the catalog_xmin sent by the replica is stored as the catalog_xmin on the physical slot instead.

Either way, if you have hot_standby_feedback enabled on a standby, that feedback includes the requirements of any replication slots on the standby.

Thank you for the thorough explanation.  As I noted in my reply to Andres, we routinely and intentionally create snapshots with replication slots intact (but we normally drop the slot on the master immediately), so our own use case is rare and it's not surprising that we don't find a thorough explanation of this scenario in the docs.

Thanks,
Jeremy