Thread: [HACKERS] pg_basebackups and slots
I've started work on a patch to make pg_basebackup use the temporary slots feature that has been committed (thanks Petr!!). The reason for this is to avoid the cases where a burst of traffic on the master during the backup can cause the receive log part of the basebackup to fall far enough behind that it fails.
I have a few considerations at this point, about interaction with existing options.
Does that seem reasonable? Or would people prefer it to default to off?
Right now, we have -S/--slot which specifies a slot name. If you want to use that, you have to create the slot ahead of time, and it will be a permanent slot (of course). This is primarily documented as a feature to use for replication (to make sure xlog is kept around until the standby is started up), but it does also solve the same problem. But to use it for base backups today you have to manually create the slot, then base backup, then drop the slot, which is error prone.
My thought is that if -S/--slot is not specified, but -X stream is, then we use a temporary slot always. This obviously requires the server to be configured with enough slots (I still think we should change the default here, but that's a different topic), but I think that's acceptable. Then we should add a "--no-slot" to make it revert to previous behaviour.
Does that seem reasonable? Or would people prefer it to default to off?
However, it also made me think that the interface for setting up a replica with a *permanent* slot would be much nicer if we could just have pg_basebackup create the slot, instead of having to create it manually.
Basically, I'm envisioning adding an IF_NOT_EXISTS argument to CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it wants to use and have it created if necessary. Do people think think this is worth doing? (It would also make pg_receivexlog and pg_receivelogical easier to use, I think, and it would obviously be implemented there as well).
On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
I've started work on a patch to make pg_basebackup use the temporary slots feature that has been committed (thanks Petr!!). The reason for this is to avoid the cases where a burst of traffic on the master during the backup can cause the receive log part of the basebackup to fall far enough behind that it fails.I have a few considerations at this point, about interaction with existing options.Right now, we have -S/--slot which specifies a slot name. If you want to use that, you have to create the slot ahead of time, and it will be a permanent slot (of course). This is primarily documented as a feature to use for replication (to make sure xlog is kept around until the standby is started up), but it does also solve the same problem. But to use it for base backups today you have to manually create the slot, then base backup, then drop the slot, which is error prone.My thought is that if -S/--slot is not specified, but -X stream is, then we use a temporary slot always. This obviously requires the server to be configured with enough slots (I still think we should change the default here, but that's a different topic), but I think that's acceptable. Then we should add a "--no-slot" to make it revert to previous behaviour.
Does that seem reasonable? Or would people prefer it to default to off?However, it also made me think that the interface for setting up a replica with a *permanent* slot would be much nicer if we could just have pg_basebackup create the slot, instead of having to create it manually.Basically, I'm envisioning adding an IF_NOT_EXISTS argument to CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it wants to use and have it created if necessary. Do people think think this is worth doing? (It would also make pg_receivexlog and pg_receivelogical easier to use, I think, and it would obviously be implemented there as well).
Pah, nevermind this last question. I was looking a completely broken branch -- we already have the if-not-exists parameter for pg_receivexlog/pg_receivelogical. I shouldn't write emails before thinking...
On Thu, Dec 15, 2016 at 10:26 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander <magnus@hagander.net> wrote:I've started work on a patch to make pg_basebackup use the temporary slots feature that has been committed (thanks Petr!!). The reason for this is to avoid the cases where a burst of traffic on the master during the backup can cause the receive log part of the basebackup to fall far enough behind that it fails.I have a few considerations at this point, about interaction with existing options.Right now, we have -S/--slot which specifies a slot name. If you want to use that, you have to create the slot ahead of time, and it will be a permanent slot (of course). This is primarily documented as a feature to use for replication (to make sure xlog is kept around until the standby is started up), but it does also solve the same problem. But to use it for base backups today you have to manually create the slot, then base backup, then drop the slot, which is error prone.My thought is that if -S/--slot is not specified, but -X stream is, then we use a temporary slot always. This obviously requires the server to be configured with enough slots (I still think we should change the default here, but that's a different topic), but I think that's acceptable. Then we should add a "--no-slot" to make it revert to previous behaviour.
Does that seem reasonable? Or would people prefer it to default to off?
So here's a patch that does this, for discussion. It implements the following behavior for -X:
* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use an existing replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporary replication slot while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, run without a slot like before.
Attachment
> but -X stream is, then we use a temporary slot always.
This seems even more useful with -X fetch to me, as with fetch we are sure we
are not immediately receiving the WAL. So, I'd propose to use it whenever -X
is specified, regardless of which method is specified.
> (I still think we should change the default here, but that's a different topic)
+1
> Does that seem reasonable? Or would people prefer it to default to off?
Yes. Users are specifically requesting wal using -X, so making sure that actually happens by default would work.
On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander <magnus@hagander.net> wrote: > So here's a patch that does this, for discussion. It implements the > following behavior for -X: > > * When used with <10.0 servers, behave just like before. > * When -S <name> is specified, behave just like before (use an existing > replication slot, fail if it does not exist) > * When used on 10.0 with no -S, create and use a temporary replication slot > while running, with name pg_basebackup_<pid>. > * When used with 10.0 with no -S but --no-slot specified, run without a slot > like before. There are users using -X stream without a slot now because they don't want to cause WAL retention in pg_xlog and are ready for retries in taking the base backup... I am wondering if it is a good idea to change the default behavior and not introduce a new option like --temporary-slot, or --slot-mode=temp|persistent|none with none being the default instead. There are users caring about pg_xlog not filling up if pg_basebackup hangs on write for example. And it may be a good idea to be able to use --slot-mode=temp with -S/--slot actually to allow users to monitor it. If no slot name is given with --slot generating one looks fine to me. Regarding the patch, it would be good to add a set of TAP tests to cover the new features. I am not seeing anything badly wrong with it at first sight by the way. -- Michael
On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com> wrote:
There are users using -X stream without a slot now because they don't want to
On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> So here's a patch that does this, for discussion. It implements the
> following behavior for -X:
>
> * When used with <10.0 servers, behave just like before.
> * When -S <name> is specified, behave just like before (use an existing
> replication slot, fail if it does not exist)
> * When used on 10.0 with no -S, create and use a temporary replication slot
> while running, with name pg_basebackup_<pid>.
> * When used with 10.0 with no -S but --no-slot specified, run without a slot
> like before.
cause WAL retention in pg_xlog and are ready for retries in taking the base
backup... I am wondering if it is a good idea to change the default behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs on write
for example. And it may be a good idea to be able to use --slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot name is given
with --slot generating one looks fine to me.
I really think the default should be "what most people want", and not "whatever is compatible with a mode that was necessary because we lacked infrastructure".
Yes there are some people who are fine with their backups failing. I'm willing to say there are *many* more who benefit from an automatic slot.
Regarding the patch, it would be good to add a set of TAP tests to cover the
new features. I am not seeing anything badly wrong with it at first sight by
the way
I don't really know how to write a good test for it. I mean, we could run it with the parameter, but how to we make it actually verify the slot? Make a really big database to make it guaranteed to be slow enough that we can notice it in a different session?
/Magnus
At 2016-12-16 07:32:24 +0100, magnus@hagander.net wrote: > > I really think the default should be "what most people want", and not > "whatever is compatible with a mode that was necessary because we > lacked infrastructure". Very much agreed. -- Abhijit
On Fri, Dec 16, 2016 at 3:32 PM, Magnus Hagander <magnus@hagander.net> wrote: > I don't really know how to write a good test for it. I mean, we could run it > with the parameter, but how to we make it actually verify the slot? Make a > really big database to make it guaranteed to be slow enough that we can > notice it in a different session? No, not that. I was meaning tests to trigger the error code paths with the compatibility switches you are adding. -- Michael
On 16/12/16 07:32, Magnus Hagander wrote: > > On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com > <mailto:michael.paquier@gmail.com>> wrote: > > > > On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander > <magnus@hagander.net <mailto:magnus@hagander.net>> wrote: > > So here's a patch that does this, for discussion. It implements the > > following behavior for -X: > > > > * When used with <10.0 servers, behave just like before. > > * When -S <name> is specified, behave just like before (use an > existing > > replication slot, fail if it does not exist) > > * When used on 10.0 with no -S, create and use a temporary > replication slot > > while running, with name pg_basebackup_<pid>. > > * When used with 10.0 with no -S but --no-slot specified, run > without a slot > > like before. > > There are users using -X stream without a slot now because they > don't want to > cause WAL retention in pg_xlog and are ready for retries in taking > the base > backup... I am wondering if it is a good idea to change the default > behavior > and not introduce a new option like --temporary-slot, or > --slot-mode=temp|persistent|none with none being the default > instead. There > are users caring about pg_xlog not filling up if pg_basebackup hangs > on write > for example. And it may be a good idea to be able to use > --slot-mode=temp > with -S/--slot actually to allow users to monitor it. If no slot > name is given > with --slot generating one looks fine to me. > > > I really think the default should be "what most people want", and not > "whatever is compatible with a mode that was necessary because we lacked > infrastructure". +1 (btw glad you made the patch) > > Yes there are some people who are fine with their backups failing. I'm > willing to say there are *many* more who benefit from an automatic slot. > I think the issue with slots taking over disk space should be fixed by making it possible to cut off slots when necessary (ie some GUC that would say how much behind slot can be at maximum, with something like 0 being unlimited like it's now) instead of trying to work around that in every client. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/15/16 4:04 AM, Magnus Hagander wrote: > This obviously requires the server to be configured with enough slots (I > still think we should change the default here, but that's a different > topic), but I think that's acceptable. We could implicitly reserve one replication slot per walsender. And then max_replication_slots (possibly renamed) would just be additional slots beyond that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 16, 2016 at 7:40 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Dec 16, 2016 at 3:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I don't really know how to write a good test for it. I mean, we could run it
> with the parameter, but how to we make it actually verify the slot? Make a
> really big database to make it guaranteed to be slow enough that we can
> notice it in a different session?
No, not that. I was meaning tests to trigger the error code paths with
the compatibility switches you are adding.
There is only one switch - --no-slot. I guess you mean rune one backup with that one? Sure, that's easy enough to do. I'm not sure how much it really tests, but let's do it :)
You mean something like this, right?
Attachment
On Fri, Dec 16, 2016 at 12:41 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
-- On 16/12/16 07:32, Magnus Hagander wrote:
>
> On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com
> <mailto:michael.paquier@gmail.com>> wrote:
>
>
>
> On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander+1 (btw glad you made the patch)> <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
> > So here's a patch that does this, for discussion. It implements the
> > following behavior for -X:
> >
> > * When used with <10.0 servers, behave just like before.
> > * When -S <name> is specified, behave just like before (use an
> existing
> > replication slot, fail if it does not exist)
> > * When used on 10.0 with no -S, create and use a temporary
> replication slot
> > while running, with name pg_basebackup_<pid>.
> > * When used with 10.0 with no -S but --no-slot specified, run
> without a slot
> > like before.
>
> There are users using -X stream without a slot now because they
> don't want to
> cause WAL retention in pg_xlog and are ready for retries in taking
> the base
> backup... I am wondering if it is a good idea to change the default
> behavior
> and not introduce a new option like --temporary-slot, or
> --slot-mode=temp|persistent|none with none being the default
> instead. There
> are users caring about pg_xlog not filling up if pg_basebackup hangs
> on write
> for example. And it may be a good idea to be able to use
> --slot-mode=temp
> with -S/--slot actually to allow users to monitor it. If no slot
> name is given
> with --slot generating one looks fine to me.
>
>
> I really think the default should be "what most people want", and not
> "whatever is compatible with a mode that was necessary because we lacked
> infrastructure".
I'm glad you made the temp slots, so I could abandon that branch :)
> Yes there are some people who are fine with their backups failing. I'm
> willing to say there are *many* more who benefit from an automatic slot.
>
I think the issue with slots taking over disk space should be fixed by
making it possible to cut off slots when necessary (ie some GUC that
would say how much behind slot can be at maximum, with something like 0
being unlimited like it's now) instead of trying to work around that in
every client.
Agreed, that seems like a better solution.
On Sat, Dec 17, 2016 at 3:54 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/15/16 4:04 AM, Magnus Hagander wrote:
> This obviously requires the server to be configured with enough slots (I
> still think we should change the default here, but that's a different
> topic), but I think that's acceptable.
We could implicitly reserve one replication slot per walsender. And
then max_replication_slots (possibly renamed) would just be additional
slots beyond that.
That makes a lot of sense now that we have temporary replication slots, I agree. And then max_replication_slots could be something like persistent_replication_slots?
On 12/17/16 9:55 AM, Magnus Hagander wrote: > That makes a lot of sense now that we have temporary replication slots, > I agree. And then max_replication_slots could be something like > persistent_replication_slots? I was thinking was more like that each walsender gets one fixed slot, which can be temporary or persistent. Or maybe default max_replication_slots to something like -1, which means equal to max_wal_senders. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
This patch looks reasonable to me. Attached is a top-up patch with a few small fixups. I suggest to wait for the resolution of the "Replication/backup defaults" thread. I would not want to be in a situation where users who have not been trained to use replication slots now have yet another restart-only parameter to set before they can take a backup. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
This patch looks reasonable to me.
Attached is a top-up patch with a few small fixups.
I suggest to wait for the resolution of the "Replication/backup
defaults" thread. I would not want to be in a situation where users who
have not been trained to use replication slots now have yet another
restart-only parameter to set before they can take a backup.
Thanks for the review and the top-up patch. I've applied it and pushed.
I just realized I missed crediting you with the review and the top-up in the commit message. My apologies for this.
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander <magnus@hagander.net> wrote: > > > On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> >> This patch looks reasonable to me. >> >> Attached is a top-up patch with a few small fixups. >> >> I suggest to wait for the resolution of the "Replication/backup >> defaults" thread. I would not want to be in a situation where users who >> have not been trained to use replication slots now have yet another >> restart-only parameter to set before they can take a backup. >> > > Thanks for the review and the top-up patch. I've applied it and pushed. - if (replication_slot && includewal != STREAM_WAL) + if ((replication_slot || no_slot) && includewal != STREAM_WAL) Why do we need to check "no_slot" here? Since no_slot=true means that no temporary replication slot is specified, it's fine even with both -X none and fetch. Regards, -- Fujii Masao
On Mon, Jan 16, 2017 at 4:33 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander <magnus@hagander.net> wrote:
>
>
> On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> This patch looks reasonable to me.
>>
>> Attached is a top-up patch with a few small fixups.
>>
>> I suggest to wait for the resolution of the "Replication/backup
>> defaults" thread. I would not want to be in a situation where users who
>> have not been trained to use replication slots now have yet another
>> restart-only parameter to set before they can take a backup.
>>
>
> Thanks for the review and the top-up patch. I've applied it and pushed.
- if (replication_slot && includewal != STREAM_WAL)
+ if ((replication_slot || no_slot) && includewal != STREAM_WAL)
Why do we need to check "no_slot" here? Since no_slot=true means that
no temporary replication slot is specified, it's fine even with both -X none
and fetch.
Um yeah, that looks like a bad merge actually. Will fix.