Re: Including replication slot data in base backups - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Including replication slot data in base backups
Date
Msg-id CA+Tgmoau5MzCSU3CMp2Jnh+8jYHWAgc1KRWtGAJXQveg8KMx8g@mail.gmail.com
Whole thread Raw
In response to Re: Including replication slot data in base backups  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Including replication slot data in base backups
List pgsql-hackers
On Tue, Apr 1, 2014 at 10:45 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Apr 1, 2014 at 2:24 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> As of now, pg_basebackup creates an empty repository for pg_replslot/
>> in a base backup, forcing the user to recreate slots on other nodes of
>> the cluster with pg_create_*_replication_slot, or copy pg_replslot
>> from another node. This is not really user-friendly especially after a
>> failover where a given slave may not have the replication slot
>> information of the master that it is replacing.
>>
>> The simple patch attached adds a new option in pg_basebackup, called
>> --replication-slot, allowing to include replication slot information
>> in a base backup. This is done by extending the command BASE_BACKUP in
>> the replication protocol.
>
> --replication-slots would be a better name (plural), or probably
> --include-replication-slots. (and that comment also goes for the BASE_BACKUP
> syntax and variables)
>
> But. If you want to solve the failover case, doesn't that mean you need to
> include it in the *replication* stream and not just the base backup?
> Otherwise, what you're sending over might well be out of date set of slots
> once the failover happens? What if the set of replication slots change
> between the time of the basebackup and the failover?

As a general comment, I think that replication slots, while a great
feature, have more than the usual potential for self-inflicted injury.A replication slot prevents the global xmin from
advancing(so your
 
tables will bloat) and WAL from being removed (so your pg_xlog
directory will fill up and take down the server).  The very last thing
you want to do is to keep around a replication slot that should have
been dropped, and I suspect a decent number of users are going to make
that mistake, just as they do with prepared transactions and backends
left idle in transaction.

So I view this proposal with a bit of skepticism for that reason.  If
you end up copying the replication slots when you didn't really want
to, or when you only wanted some of them, you will be sad.  In
particular, suppose you have a master and 2 standbys, each of which
has a replication slot.  The master fails; a standby is promoted.  If
the standby has the master's replication slots, that's wrong: perhaps
the OTHER standby's slot should stick around for the standby to
connect to, but the standby's OWN slot on the master shouldn't be kept
around.

It's also part of the idea here that a cascading standby should be
able to have its own slots for its downstream standbys.  It should
retain WAL locally for those standbys, but it should NOT retain WAL
for the master's other standbys.  This definitely doesn't work yet for
logical slots; I'm not sure about physical slots.  But it's part of
the plan, for sure.  Here again, copying the slots from the master is
the wrong thing.

Now, it would be great to have some more technology in this area.  It
would be pretty nifty if we could set things up so that the promotion
process could optionally assume and activate a configurable subset of
the master's slots at failover/switchover time - but the administrator
would need to also make sure those machines were going to reconnect to
the new master.  Or maybe we could find a way to automate that, too,
but either way I think we're going to need something a *lot* more
sophisticated than just copying all the slots.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pg_stat_statements cluttered with "DEALLOCATE dbdpg_p*"
Next
From: Robert Haas
Date:
Subject: Re: psql \d+ and oid display