Re: Including replication slot data in base backups - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Including replication slot data in base backups |
Date | |
Msg-id | CA+Tgmoau5MzCSU3CMp2Jnh+8jYHWAgc1KRWtGAJXQveg8KMx8g@mail.gmail.com Whole thread Raw |
In response to | Re: Including replication slot data in base backups (Magnus Hagander <magnus@hagander.net>) |
Responses |
Re: Including replication slot data in base backups
|
List | pgsql-hackers |
On Tue, Apr 1, 2014 at 10:45 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Apr 1, 2014 at 2:24 PM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> >> As of now, pg_basebackup creates an empty repository for pg_replslot/ >> in a base backup, forcing the user to recreate slots on other nodes of >> the cluster with pg_create_*_replication_slot, or copy pg_replslot >> from another node. This is not really user-friendly especially after a >> failover where a given slave may not have the replication slot >> information of the master that it is replacing. >> >> The simple patch attached adds a new option in pg_basebackup, called >> --replication-slot, allowing to include replication slot information >> in a base backup. This is done by extending the command BASE_BACKUP in >> the replication protocol. > > --replication-slots would be a better name (plural), or probably > --include-replication-slots. (and that comment also goes for the BASE_BACKUP > syntax and variables) > > But. If you want to solve the failover case, doesn't that mean you need to > include it in the *replication* stream and not just the base backup? > Otherwise, what you're sending over might well be out of date set of slots > once the failover happens? What if the set of replication slots change > between the time of the basebackup and the failover? As a general comment, I think that replication slots, while a great feature, have more than the usual potential for self-inflicted injury.A replication slot prevents the global xmin from advancing(so your tables will bloat) and WAL from being removed (so your pg_xlog directory will fill up and take down the server). The very last thing you want to do is to keep around a replication slot that should have been dropped, and I suspect a decent number of users are going to make that mistake, just as they do with prepared transactions and backends left idle in transaction. So I view this proposal with a bit of skepticism for that reason. If you end up copying the replication slots when you didn't really want to, or when you only wanted some of them, you will be sad. In particular, suppose you have a master and 2 standbys, each of which has a replication slot. The master fails; a standby is promoted. If the standby has the master's replication slots, that's wrong: perhaps the OTHER standby's slot should stick around for the standby to connect to, but the standby's OWN slot on the master shouldn't be kept around. It's also part of the idea here that a cascading standby should be able to have its own slots for its downstream standbys. It should retain WAL locally for those standbys, but it should NOT retain WAL for the master's other standbys. This definitely doesn't work yet for logical slots; I'm not sure about physical slots. But it's part of the plan, for sure. Here again, copying the slots from the master is the wrong thing. Now, it would be great to have some more technology in this area. It would be pretty nifty if we could set things up so that the promotion process could optionally assume and activate a configurable subset of the master's slots at failover/switchover time - but the administrator would need to also make sure those machines were going to reconnect to the new master. Or maybe we could find a way to automate that, too, but either way I think we're going to need something a *lot* more sophisticated than just copying all the slots. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: