Hi,
On 2019-11-18 15:36:47 -0600, Jeremy Finzel wrote:
> We had a scenario today that was new to us. We had a logical replication
> slot that was severely far behind. Before dropping this logical slot, we
> made a physical point-in-time-recovery snapshot of the system with this
> logical slot.
> This logical slot was causing severe catalog bloat. We proceeded to drop
> the logical slot which was over 12000 WAL segments behind. The physical
> slot was only a few 100 segments behind and still in place.
>
> But now proceeding to VAC FULL the catalog tables did not recover any bloat
> beyond the now-dropped logical slot. Eventually to our surprise, we found
> that dropping the physical slot allowed us to recover the bloat.
>
> We saw in forensics after the fact that xmin of the physical slot equaled
> the catalog_xmin of the logical slot. Is there some dependency here where
> physical slots made of a system retain all transactions of logical slots it
> contains as well? If so, could someone help us understand this, and is
> there documentation around this? Is this by design?
>
> We had thought that the physical slot would only retain the WAL it needed
> for its own restart_lsn, not the segments needed by only logical slots as
> well. Any explanation would be much appreciated!
The logical slot on the standby affects hot_standby_feedback, which in
turn means that the physical slot also transports xmin horizons to the
primary.
Note that our docs suggest to drop slots when cloning a node (and
pg_basebackup/basebackup on the server side do so automatically):
<para>
It is often a good idea to also omit from the backup the files
within the cluster's <filename>pg_replslot/</filename> directory, so that
replication slots that exist on the master do not become part of the
backup. Otherwise, the subsequent use of the backup to create a standby
may result in indefinite retention of WAL files on the standby, and
possibly bloat on the master if hot standby feedback is enabled, because
the clients that are using those replication slots will still be connecting
to and updating the slots on the master, not the standby. Even if the
backup is only intended for use in creating a new master, copying the
replication slots isn't expected to be particularly useful, since the
contents of those slots will likely be badly out of date by the time
the new master comes on line.
</para>
It's generally useful to look at pg_stat_replication for these kinds of
things...
Greetings,
Andres Freund