Thread: Copy function for logical replication slots
Hi, I'd like to propose a copy function for logical replication slots. Currently when we create a new logical replication slot it starts to read WAL from an LSN of the current insert. This function copies a existing logical replication slot while changing output plugin and persistence. That is, the copied new replication slot starts from the same LSN as the source one. Since a new copied slot starts from the same LSN of existing one we don't need to care about WAL reservation. A use case I imagined is for investigations for example. I mean that when replication collision occurs on subscriber there is no way to see what replicated data is conflicting (perhaps error log helps it but is not detailed) and there is no way to advance a replication origin in order to exactly skip to apply conflicting data. By creating a new logical slot with a different output plugin at the same LSN, we can see what data a replication slot will decode (and send) and those LSNs as well. This function will help for that purpose. Here is execution samples. postgres(1:17715)=# select pg_create_logical_replication_slot('orig_slot', 'test_decoding'); pg_create_logical_replication_slot ------------------------------------ (orig_slot,0/164A410) (1 row) Time: 17.759 ms postgres(1:17715)=# select pg_copy_logical_replication_slot('orig_slot', 'copy1_slot'); pg_copy_logical_replication_slot ---------------------------------- (copy1_slot,0/164A410) (1 row) Time: 6.074 ms postgres(1:17715)=# select pg_copy_logical_replication_slot('orig_slot', 'copy2_slot', 'wal2json'); pg_copy_logical_replication_slot ---------------------------------- (copy2_slot,0/164A410) (1 row) Time: 6.201 ms postgres(1:17715)=# select pg_copy_logical_replication_slot('orig_slot', 'copy3_slot', 'wal2json', true); pg_copy_logical_replication_slot ---------------------------------- (copy3_slot,0/164A410) (1 row) Time: 5.071 ms postgres(1:17715)=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- copy3_slot | wal2json | logical | 13237 | postgres | t | t | 17715 | | 568 | 0/164A3D8 | 0/164A410 copy2_slot | wal2json | logical | 13237 | postgres | f | f | | | 568 | 0/164A3D8 | 0/164A410 copy1_slot | orig_slot | logical | 13237 | postgres | f | f | | | 568 | 0/164A3D8 | 0/164A410 orig_slot | test_decoding | logical | 13237 | postgres | f | f | | | 568 | 0/164A3D8 | 0/164A410 (4 rows) Feedback is very welcome. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Jun 28, 2018 at 11:51:20AM +0900, Masahiko Sawada wrote: > A use case I imagined is for investigations for example. I mean that > when replication collision occurs on subscriber there is no way to see > what replicated data is conflicting (perhaps error log helps it but is > not detailed) and there is no way to advance a replication origin in > order to exactly skip to apply conflicting data. By creating a new > logical slot with a different output plugin at the same LSN, we can > see what data a replication slot will decode (and send) and those LSNs > as well. This function will help for that purpose. Hm. Shouldn't the original slot copied be owned by the process doing the copy with ReplicationSlotAcquire? There could be some cases where copying a physical slot also makes sense. -- Michael
Attachment
On Thu, Jun 28, 2018 at 12:29 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jun 28, 2018 at 11:51:20AM +0900, Masahiko Sawada wrote: >> A use case I imagined is for investigations for example. I mean that >> when replication collision occurs on subscriber there is no way to see >> what replicated data is conflicting (perhaps error log helps it but is >> not detailed) and there is no way to advance a replication origin in >> order to exactly skip to apply conflicting data. By creating a new >> logical slot with a different output plugin at the same LSN, we can >> see what data a replication slot will decode (and send) and those LSNs >> as well. This function will help for that purpose. > > Hm. Shouldn't the original slot copied be owned by the process doing > the copy with ReplicationSlotAcquire? Right, it should do and release it before creating new one. > There could be some cases where > copying a physical slot also makes sense. I've thought that but I didn't find concrete use case. That's why I started with only logical slot. Attached v2 patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Jun 28, 2018 at 03:34:00PM +0900, Masahiko Sawada wrote: > On Thu, Jun 28, 2018 at 12:29 PM, Michael Paquier <michael@paquier.xyz> wrote: >> Hm. Shouldn't the original slot copied be owned by the process doing >> the copy with ReplicationSlotAcquire? > > Right, it should do and release it before creating new one. Yes, or MyReplicationSlot would not like that. >> There could be some cases where >> copying a physical slot also makes sense. > > I've thought that but I didn't find concrete use case. That's why I > started with only logical slot. Let's imagine the case of a single base backup which is associated to a given replication slot, and that this backup is then used to spawn multiple standbys where each one of them needs a separate slot to consume changes at their pace. If you can copy the slot used in the first backup, then both nodes could consume it. That looks useful to me to make sure that both slots are based a consistent point. -- Michael
Attachment
On 6/28/18 08:47, Michael Paquier wrote: >>> There could be some cases where >>> copying a physical slot also makes sense. >> I've thought that but I didn't find concrete use case. That's why I >> started with only logical slot. > Let's imagine the case of a single base backup which is associated to a > given replication slot, and that this backup is then used to spawn > multiple standbys where each one of them needs a separate slot to > consume changes at their pace. If you can copy the slot used in the > first backup, then both nodes could consume it. That looks useful to > me to make sure that both slots are based a consistent point. I think this use case of cloning replicas would also be interesting in the logical slot case. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 28, 2018 at 5:37 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/28/18 08:47, Michael Paquier wrote: >>>> There could be some cases where >>>> copying a physical slot also makes sense. >>> I've thought that but I didn't find concrete use case. That's why I >>> started with only logical slot. >> Let's imagine the case of a single base backup which is associated to a >> given replication slot, and that this backup is then used to spawn >> multiple standbys where each one of them needs a separate slot to >> consume changes at their pace. If you can copy the slot used in the >> first backup, then both nodes could consume it. That looks useful to >> me to make sure that both slots are based a consistent point. > Thank you, that sounds useful. I'll update the patch to include physical slots. > I think this use case of cloning replicas would also be interesting in > the logical slot case. > +1 Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jun 28, 2018 at 7:10 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Jun 28, 2018 at 5:37 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/28/18 08:47, Michael Paquier wrote: >>>>> There could be some cases where >>>>> copying a physical slot also makes sense. >>>> I've thought that but I didn't find concrete use case. That's why I >>>> started with only logical slot. >>> Let's imagine the case of a single base backup which is associated to a >>> given replication slot, and that this backup is then used to spawn >>> multiple standbys where each one of them needs a separate slot to >>> consume changes at their pace. If you can copy the slot used in the >>> first backup, then both nodes could consume it. That looks useful to >>> me to make sure that both slots are based a consistent point. >> > > Thank you, that sounds useful. I'll update the patch to include physical slots. > >> I think this use case of cloning replicas would also be interesting in >> the logical slot case. >> > > +1 > Attached an updated patch including copy function support for logical slots as well as physical slots. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Jul 02, 2018 at 04:31:32PM +0900, Masahiko Sawada wrote: > Attached an updated patch including copy function support for logical > slots as well as physical slots. Please review it. I had a look at this patch. As the output plugin can be changed for logical slots, having two functions is a good thing. +# Basic decoding works using the copied slot +$result = $node_master->safe_psql('postgres', + qq[SELECT pg_logical_slot_get_changes('copy_slot', NULL, NULL);]); +is(scalar(@foobar = split /^/m, $result), + 12, 'Decoding produced 12 rows inc BEGIN/COMMIT using the copied slot'); This could live as well as part of the test suite for test_decoding, and that would be faster as well. There are more scenarios that could be tested as well: - Copy a temporary slot to become permanent and switch its plugin type, then check the state of the slot in pg_replication_slots. - Do the reverse, aka copy a permanent slot and make it temporary. - Checking that the default behaviors are preserved: if persistent is not changed then the new slot should have the same persistence as the origin. The same applies for the output plugin, and for both logical and replication slots. - Check that the reversed LSN is the same for both the origin and the target. + Copy a existing <parameter>src_slot_name</parameter> physical slot + to <parameter>dst_slot_name</parameter>. The copied physical slot starts + to reserve WAL from the same LSN as the source slot if the source slot + already reserves WAL. <parameter>temporary</parameter> is optional. If + <parameter>temporary</parameter> is omitted, the same value of the + source slot is used. Typo here. Copy AN existing slot. I would reword that a bit: Copies an existing physical replication slot name src_slot_name to a physical replication slot named dst_slot_name. Extra one: "the same value AS the source slot is used." + Copy a existing <parameter>src_slot_name</parameter> logical (decoding) slot + to <parameter>dst_slot_name</parameter> while changing the output plugin + and persistence. There may be no need for "decoding" here, the same phrasing as suggested above would be nicer I think. For LSN I would suggest to add an <acronym> markup. I am not sure if it makes much sense to be able to copy from a slot which has not yet been used to reserve WAL, but to change the properties of a slot I could get that forbidding the behavior is not really intuitive either. - ReplicationSlotReserveWal(); + /* Find start location to read WAL if not specified */ + if (XLogRecPtrIsInvalid(start_lsn)) + ReplicationSlotReserveWal(); + else + { + SpinLockAcquire(&slot->mutex); + slot->data.restart_lsn = start_lsn; + SpinLockRelease(&slot->mutex); + } Hmm. Instead of all this stanza in CreateInitDecodingContext(), I would have imagined that what should happen is that the new fresh slot gets created with the same status data as the origin. This saves quite a bit of extra post-creation computing, and we are also sure that the origin slot has consistent data as it is owned by the process doing the copy. One property which seems important to me is to make sure that the target slot has the same data as the origin slot once the caller knows that the copy has completed, and not that the target slot may perhaps have the same data as the origin while creating the target. Do you see the difference? Your patch does the latter, because it creates the new slot after releasing the lock it held on the origin, while I would think that the former is more important, aka keep the lock for the duration of the copy. I am not completely sure if that's a property we want to keep, but that deserves discussion as we should not do a copy while the origin slot may still be consumed in parallel. For physical slot the copy is straight-forward, less for logical slots. -- Michael
Attachment
On Tue, Jul 3, 2018 at 1:01 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jul 02, 2018 at 04:31:32PM +0900, Masahiko Sawada wrote: >> Attached an updated patch including copy function support for logical >> slots as well as physical slots. Please review it. > > I had a look at this patch. Thank you for the reviews. > > As the output plugin can be changed for logical slots, having two > functions is a good thing. > > +# Basic decoding works using the copied slot > +$result = $node_master->safe_psql('postgres', > + qq[SELECT pg_logical_slot_get_changes('copy_slot', NULL, NULL);]); > +is(scalar(@foobar = split /^/m, $result), > + 12, 'Decoding produced 12 rows inc BEGIN/COMMIT using the copied slot'); > > This could live as well as part of the test suite for test_decoding, and > that would be faster as well. There are more scenarios that could be > tested as well: > - Copy a temporary slot to become permanent and switch its plugin type, > then check the state of the slot in pg_replication_slots. > - Do the reverse, aka copy a permanent slot and make it temporary. > - Checking that the default behaviors are preserved: if persistent is > not changed then the new slot should have the same persistence as the > origin. The same applies for the output plugin, and for both logical > and replication slots. > - Check that the reversed LSN is the same for both the origin and the > target. Will fix. > > + Copy a existing <parameter>src_slot_name</parameter> physical slot > + to <parameter>dst_slot_name</parameter>. The copied physical slot starts > + to reserve WAL from the same LSN as the source slot if the source slot > + already reserves WAL. <parameter>temporary</parameter> is optional. If > + <parameter>temporary</parameter> is omitted, the same value of the > + source slot is used. > > Typo here. Copy AN existing slot. I would reword that a bit: > Copies an existing physical replication slot name src_slot_name to a > physical replication slot named dst_slot_name. > Extra one: "the same value AS the source slot is used." > > + Copy a existing <parameter>src_slot_name</parameter> logical (decoding) slot > + to <parameter>dst_slot_name</parameter> while changing the output plugin > + and persistence. > > There may be no need for "decoding" here, the same phrasing as suggested > above would be nicer I think. For LSN I would suggest to add an > <acronym> markup. Will fix. > > I am not sure if it makes much sense to be able to copy from a slot > which has not yet been used to reserve WAL, but to change the properties > of a slot I could get that forbidding the behavior is not really > intuitive either. > > - ReplicationSlotReserveWal(); > + /* Find start location to read WAL if not specified */ > + if (XLogRecPtrIsInvalid(start_lsn)) > + ReplicationSlotReserveWal(); > + else > + { > + SpinLockAcquire(&slot->mutex); > + slot->data.restart_lsn = start_lsn; > + SpinLockRelease(&slot->mutex); > + } > Hmm. Instead of all this stanza in CreateInitDecodingContext(), I would > have imagined that what should happen is that the new fresh slot gets > created with the same status data as the origin. This saves quite a bit > of extra post-creation computing, and we are also sure that the origin > slot has consistent data as it is owned by the process doing the copy. Hmm, such post-creation computing is not necessary for the copied slot? I'm concerned we might miss some operations required for for logical replication slot. > > One property which seems important to me is to make sure that the target > slot has the same data as the origin slot once the caller knows that the > copy has completed, and not that the target slot may perhaps have the > same data as the origin while creating the target. Do you see the > difference? Your patch does the latter, because it creates the new slot > after releasing the lock it held on the origin, while I would think that > the former is more important, aka keep the lock for the duration of the > copy. Did you mean "the caller" is clients who call pg_copy_logical_replication_slot function? If so, I think it's very difficult to ensure the former because the origin slot can advance during returning the result to the client. Also if we keep the lock on the origin slot during the copy I think that one problem is that we will own two replication slots at a time. But there are a lot of code that premise a process holds at most one replication slot. >I am not completely sure if that's a property we want to keep, > but that deserves discussion as we should not do a copy while the origin > slot may still be consumed in parallel. For physical slot the copy is > straight-forward, less for logical slots. I think this copy functions ensure that the target slot has the same values as the origin slot at the time when the function is called. Isn't it clear? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 28 June 2018 at 10:51, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi,
I'd like to propose a copy function for logical replication slots.
Currently when we create a new logical replication slot it starts to
read WAL from an LSN of the current insert. This function copies a
existing logical replication slot while changing output plugin and
persistence. That is, the copied new replication slot starts from the
same LSN as the source one. Since a new copied slot starts from the
same LSN of existing one we don't need to care about WAL reservation.
Strong agreement from me.
The inability to switch plugins is a massive pain. I've worked around it with similar functions bundled into the extensions I work with that do logical decoding related work. It makes sense to have it in core.
A use case I imagined is for investigations for example. I mean that
when replication collision occurs on subscriber there is no way to see
what replicated data is conflicting (perhaps error log helps it but is
not detailed) and there is no way to advance a replication origin in
order to exactly skip to apply conflicting data.
pglogical handles this by letting you peek the slot in a separate json protocol output mode, but that doesn't help with pgoutput.
On Tue, Jul 03, 2018 at 05:27:07PM +0900, Masahiko Sawada wrote: > On Tue, Jul 3, 2018 at 1:01 PM, Michael Paquier <michael@paquier.xyz> wrote: >> One property which seems important to me is to make sure that the target >> slot has the same data as the origin slot once the caller knows that the >> copy has completed, and not that the target slot may perhaps have the >> same data as the origin while creating the target. Do you see the >> difference? Your patch does the latter, because it creates the new slot >> after releasing the lock it held on the origin, while I would think that >> the former is more important, aka keep the lock for the duration of the >> copy. > > Did you mean "the caller" is clients who call > pg_copy_logical_replication_slot function? If so, I think it's very > difficult to ensure the former because the origin slot can advance > during returning the result to the client. Also if we keep the lock on > the origin slot during the copy I think that one problem is that we > will own two replication slots at a time. But there are a lot of code > that premise a process holds at most one replication slot. When I mean the caller, I mean the client in charge of the slot creation. Anyway, this bit is really worrying me in your patch: - ReplicationSlotReserveWal(); + /* Find start location to read WAL if not specified */ + if (XLogRecPtrIsInvalid(start_lsn)) + ReplicationSlotReserveWal(); + else + { + SpinLockAcquire(&slot->mutex); + slot->data.restart_lsn = start_lsn; + SpinLockRelease(&slot->mutex); + } Your patch simply increases the window mentioned here in slot.c because there is no interlock between the checkpointer and the process creating the slot. Please see here: /* * The replication slot mechanism is used to prevent removal of required * WAL. As there is no interlock between this routine and checkpoints, WAL * segments could concurrently be removed when a now stale return value of * ReplicationSlotsComputeRequiredLSN() is used. In the unlikely case that * this happens we'll just retry. */ So, slots created without a non-arbitrary start position have at least the will to restart if a checkpointer is running in parallel, but your patch increases the window used, and does not even retry nor does it fail to create the slot if the restart LSN points to a segment not here anymore. The trick I think here is that the slot copy should not cause checkpoint slowdowns, so more thoughts are needed here, and this is not really trivial. >> I am not completely sure if that's a property we want to keep, >> but that deserves discussion as we should not do a copy while the origin >> slot may still be consumed in parallel. For physical slot the copy is >> straight-forward, less for logical slots. > > I think this copy functions ensure that the target slot has the same > values as the origin slot at the time when the function is called. > Isn't it clear? Not really per the point above, the current version of the copy gives no actual consistency guarantees. -- Michael
Attachment
On Thu, Jul 5, 2018 at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Jul 03, 2018 at 05:27:07PM +0900, Masahiko Sawada wrote: >> On Tue, Jul 3, 2018 at 1:01 PM, Michael Paquier <michael@paquier.xyz> wrote: >>> One property which seems important to me is to make sure that the target >>> slot has the same data as the origin slot once the caller knows that the >>> copy has completed, and not that the target slot may perhaps have the >>> same data as the origin while creating the target. Do you see the >>> difference? Your patch does the latter, because it creates the new slot >>> after releasing the lock it held on the origin, while I would think that >>> the former is more important, aka keep the lock for the duration of the >>> copy. >> >> Did you mean "the caller" is clients who call >> pg_copy_logical_replication_slot function? If so, I think it's very >> difficult to ensure the former because the origin slot can advance >> during returning the result to the client. Also if we keep the lock on >> the origin slot during the copy I think that one problem is that we >> will own two replication slots at a time. But there are a lot of code >> that premise a process holds at most one replication slot. > > When I mean the caller, I mean the client in charge of the slot > creation. Anyway, this bit is really worrying me in your patch: > - ReplicationSlotReserveWal(); > + /* Find start location to read WAL if not specified */ > + if (XLogRecPtrIsInvalid(start_lsn)) > + ReplicationSlotReserveWal(); > + else > + { > + SpinLockAcquire(&slot->mutex); > + slot->data.restart_lsn = start_lsn; > + SpinLockRelease(&slot->mutex); > + } > Your patch simply increases the window mentioned here in slot.c because > there is no interlock between the checkpointer and the process creating > the slot. Please see here: > /* > * The replication slot mechanism is used to prevent removal of required > * WAL. As there is no interlock between this routine and checkpoints, WAL > * segments could concurrently be removed when a now stale return value of > * ReplicationSlotsComputeRequiredLSN() is used. In the unlikely case that > * this happens we'll just retry. > */ > So, slots created without a non-arbitrary start position have at least > the will to restart if a checkpointer is running in parallel, but your > patch increases the window used, and does not even retry nor does it > fail to create the slot if the restart LSN points to a segment not here > anymore. The trick I think here is that the slot copy should not cause > checkpoint slowdowns, so more thoughts are needed here, and this is not > really trivial. Yes, you're right. To guarantee that restart LSN of copied slot is available, it seems to me that it's better to copy new slot while holding the origin slot as you mentioned before. Since the replication slot creation code assumes that a process creating a new slot doesn't have any slots we should save origin slot temporary and create new one, and then restore it. It might be a bit tricky but would work fine. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jul 05, 2018 at 05:24:48PM +0900, Masahiko Sawada wrote: > Yes, you're right. To guarantee that restart LSN of copied slot is > available, it seems to me that it's better to copy new slot while > holding the origin slot as you mentioned before. Since the replication > slot creation code assumes that a process creating a new slot doesn't > have any slots we should save origin slot temporary and create new > one, and then restore it. This will require some refactoring first I think as most of the slot routines assume that the process owning it is the one doing the calls, so this has a string smell of a patch set being splitted. > It might be a bit tricky but would work fine. Sawada-san, will you be able to rewrite this patch soon or should it be moved to the next commit fest? I would suggest to do the latter as this is no small work, and this needs careful thoughts. -- Michael
Attachment
On Fri, Jul 6, 2018 at 9:21 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jul 05, 2018 at 05:24:48PM +0900, Masahiko Sawada wrote: >> Yes, you're right. To guarantee that restart LSN of copied slot is >> available, it seems to me that it's better to copy new slot while >> holding the origin slot as you mentioned before. Since the replication >> slot creation code assumes that a process creating a new slot doesn't >> have any slots we should save origin slot temporary and create new >> one, and then restore it. > > This will require some refactoring first I think as most of the slot > routines assume that the process owning it is the one doing the calls, > so this has a string smell of a patch set being splitted. > >> It might be a bit tricky but would work fine. > > Sawada-san, will you be able to rewrite this patch soon or should it be > moved to the next commit fest? I would suggest to do the latter as this > is no small work, and this needs careful thoughts. I think that this patch might be splitted but I will be able to send an updated patch in the next week. As you suggestion this patch needs more careful thoughts. I'll move this patch to the next commit fest if I will not be able to sent it. Is that okay? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jul 09, 2018 at 10:06:00AM +0900, Masahiko Sawada wrote: > I think that this patch might be splitted but I will be able to send > an updated patch in the next week. As you suggestion this patch needs > more careful thoughts. I'll move this patch to the next commit fest if > I will not be able to sent it. Is that okay? Fine by me. Thanks for the update. -- Michael
Attachment
On Mon, Jul 9, 2018 at 10:34 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jul 09, 2018 at 10:06:00AM +0900, Masahiko Sawada wrote: >> I think that this patch might be splitted but I will be able to send >> an updated patch in the next week. As you suggestion this patch needs >> more careful thoughts. I'll move this patch to the next commit fest if >> I will not be able to sent it. Is that okay? > > Fine by me. Thanks for the update. Attached new version of patch incorporated the all comments I got from Michael-san. To prevent the WAL segment file of restart_lsn of the origin slot from removal during creating the target slot, I've chosen a way to copy new one while holding the origin one. One problem to implement this way is that the current replication slot code doesn't allow us to do straightforwardly; the code assumes that the process creating a new slot is not holding another slot. So I've changed the copy functions so that it save temporarily MyReplicationSlot and then restore and release it after creation the target slot. To ensure that both the origin and target slot are released properly in failure cases I used PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of the logical decoding at a minimum. I've thought that we can change the logical decoding code so that it can assumes that the process can have more than one slots at once but it seems overkill to me. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Jul 12, 2018 at 9:28 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Mon, Jul 9, 2018 at 10:34 AM, Michael Paquier <michael@paquier.xyz> wrote: >> On Mon, Jul 09, 2018 at 10:06:00AM +0900, Masahiko Sawada wrote: >>> I think that this patch might be splitted but I will be able to send >>> an updated patch in the next week. As you suggestion this patch needs >>> more careful thoughts. I'll move this patch to the next commit fest if >>> I will not be able to sent it. Is that okay? >> >> Fine by me. Thanks for the update. > > Attached new version of patch incorporated the all comments I got from > Michael-san. > > To prevent the WAL segment file of restart_lsn of the origin slot from > removal during creating the target slot, I've chosen a way to copy new > one while holding the origin one. One problem to implement this way is > that the current replication slot code doesn't allow us to do > straightforwardly; the code assumes that the process creating a new > slot is not holding another slot. So I've changed the copy functions > so that it save temporarily MyReplicationSlot and then restore and > release it after creation the target slot. To ensure that both the > origin and target slot are released properly in failure cases I used > PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of > the logical decoding at a minimum. I've thought that we can change the > logical decoding code so that it can assumes that the process can have > more than one slots at once but it seems overkill to me. > > Please review it. > The previous patch conflicts with the current HEAD. Attached updated version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Aug 14, 2018 at 01:38:23PM +0900, Masahiko Sawada wrote: > On Thu, Jul 12, 2018 at 9:28 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached new version of patch incorporated the all comments I got from >> Michael-san. >> >> To prevent the WAL segment file of restart_lsn of the origin slot from >> removal during creating the target slot, I've chosen a way to copy new >> one while holding the origin one. One problem to implement this way is >> that the current replication slot code doesn't allow us to do >> straightforwardly; the code assumes that the process creating a new >> slot is not holding another slot. So I've changed the copy functions >> so that it save temporarily MyReplicationSlot and then restore and >> release it after creation the target slot. To ensure that both the >> origin and target slot are released properly in failure cases I used >> PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of >> the logical decoding at a minimum. I've thought that we can change the >> logical decoding code so that it can assumes that the process can have >> more than one slots at once but it seems overkill to me. Yeah, we may be able to live with this trick. For other processes, the process doing the copy would be seen as holding the slot so the checkpointer would not advance the oldest LSN to retain. > The previous patch conflicts with the current HEAD. Attached updated > version patch. +-- Now we have maximum 8 replication slots. Check slots are properly +-- released even when raise error during creating the target slot. +SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', 'failed'); -- error +ERROR: all replication slots are in use installcheck is going to fail on an instance which does not use exactly max_replication_slots = 8. That lacks flexibility, and you could have the same coverage by copying, then immediately dropping each new slot. + to a physical replicaiton slot named <parameter>dst_slot_name</parameter>. s/replicaiton/replicaton/. + The copied physical slot starts to reserve WAL from the same <acronym>LSN</acronym> as the + source slot if the source slot already reserves WAL. Or restricting this case? In what is it useful to allow a copy from a slot which has done nothing yet? This would also simplify the acquire and release logic of the source slot. + /* Check type of replication slot */ + if (SlotIsLogical(MyReplicationSlot)) + { + ReplicationSlotRelease(); + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + (errmsg("cannot copy a logical replication slot to a physical replication slot")))); + } No need to call ReplicationSlotRelease for an ERROR code path. Does it actually make sense to allow copy of temporary slots or change their persistence? Those don't live across sessions so they'd need to be copied in the same session which created them. -- Michael
Attachment
On Mon, Aug 20, 2018 at 2:53 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Aug 14, 2018 at 01:38:23PM +0900, Masahiko Sawada wrote: >> On Thu, Jul 12, 2018 at 9:28 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached new version of patch incorporated the all comments I got from >>> Michael-san. >>> >>> To prevent the WAL segment file of restart_lsn of the origin slot from >>> removal during creating the target slot, I've chosen a way to copy new >>> one while holding the origin one. One problem to implement this way is >>> that the current replication slot code doesn't allow us to do >>> straightforwardly; the code assumes that the process creating a new >>> slot is not holding another slot. So I've changed the copy functions >>> so that it save temporarily MyReplicationSlot and then restore and >>> release it after creation the target slot. To ensure that both the >>> origin and target slot are released properly in failure cases I used >>> PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of >>> the logical decoding at a minimum. I've thought that we can change the >>> logical decoding code so that it can assumes that the process can have >>> more than one slots at once but it seems overkill to me. > > Yeah, we may be able to live with this trick. For other processes, the > process doing the copy would be seen as holding the slot so the > checkpointer would not advance the oldest LSN to retain. > >> The previous patch conflicts with the current HEAD. Attached updated >> version patch. Thank you for reviewing this patch. > > +-- Now we have maximum 8 replication slots. Check slots are properly > +-- released even when raise error during creating the target slot. > +SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', > 'failed'); -- error > +ERROR: all replication slots are in use > > installcheck is going to fail on an instance which does not use exactly > max_replication_slots = 8. That lacks flexibility, and you could have > the same coverage by copying, then immediately dropping each new slot. Will fix. > + to a physical replicaiton slot named <parameter>dst_slot_name</parameter>. > s/replicaiton/replicaton/. > > + The copied physical slot starts to reserve WAL from the same > <acronym>LSN</acronym> as the > + source slot if the source slot already reserves WAL. > Or restricting this case? In what is it useful to allow a copy from a > slot which has done nothing yet? This would also simplify the acquire > and release logic of the source slot. > I think the copying from a slot that already reserved WAL would be helpful for backup cases (maybe you suggested?). Also, either way we need to make a safe logic of acquring and releasing the source slot for logical slots cases. Or you meant to restrict the case where the copying a slot that doesn't reserve WAL? > + /* Check type of replication slot */ > + if (SlotIsLogical(MyReplicationSlot)) > + { > + ReplicationSlotRelease(); > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + (errmsg("cannot copy a logical replication slot to a > physical replication slot")))); > + } > No need to call ReplicationSlotRelease for an ERROR code path. > Will fix. > Does it actually make sense to allow copy of temporary slots or change > their persistence? Those don't live across sessions so they'd need to > be copied in the same session which created them. I think the copying of temporary slots would be an impracticable feature but the changing their persistence might be helpful for some cases, especially copying from persistent to temporary. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Aug 28, 2018 at 04:14:04PM +0900, Masahiko Sawada wrote: > I think the copying from a slot that already reserved WAL would be > helpful for backup cases (maybe you suggested?). Also, either way we > need to make a safe logic of acquring and releasing the source slot > for logical slots cases. Or you meant to restrict the case where the > copying a slot that doesn't reserve WAL? I mean the latter, as-known-as there is no actual point in being able to copy WAL which does *not* reserve WAL. >> Does it actually make sense to allow copy of temporary slots or change >> their persistence? Those don't live across sessions so they'd need to >> be copied in the same session which created them. > > I think the copying of temporary slots would be an impracticable > feature but the changing their persistence might be helpful for some > cases, especially copying from persistent to temporary. The session doing the copy of a permanent slot to the temporary slot has to be the one also consuming it as the session which created the slot owns it, and the slot would be dropped when the session ends. For logical slots perhaps you have something in mind? Like copying a slot which is not active to check where it is currently replaying, and using the copy for sanity checks? -- Michael
Attachment
On Tue, Aug 28, 2018 at 10:34 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Aug 28, 2018 at 04:14:04PM +0900, Masahiko Sawada wrote: >> I think the copying from a slot that already reserved WAL would be >> helpful for backup cases (maybe you suggested?). Also, either way we >> need to make a safe logic of acquring and releasing the source slot >> for logical slots cases. Or you meant to restrict the case where the >> copying a slot that doesn't reserve WAL? > > I mean the latter, as-known-as there is no actual point in being able to > copy WAL which does *not* reserve WAL. Agreed. I'll restrict that case in the next version > >>> Does it actually make sense to allow copy of temporary slots or change >>> their persistence? Those don't live across sessions so they'd need to >>> be copied in the same session which created them. >> >> I think the copying of temporary slots would be an impracticable >> feature but the changing their persistence might be helpful for some >> cases, especially copying from persistent to temporary. > > The session doing the copy of a permanent slot to the temporary slot has > to be the one also consuming it as the session which created the slot > owns it, and the slot would be dropped when the session ends. For > logical slots perhaps you have something in mind? Like copying a slot > which is not active to check where it is currently replaying, and using > the copy for sanity checks? Yeah, I imagined such case. If we want to do backup/logical decoding from the same point as the source permanent slot we might want to use temporary slots so that it will be dropped surely after that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Aug 29, 2018 at 9:39 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Aug 28, 2018 at 10:34 PM, Michael Paquier <michael@paquier.xyz> wrote: >> On Tue, Aug 28, 2018 at 04:14:04PM +0900, Masahiko Sawada wrote: >>> I think the copying from a slot that already reserved WAL would be >>> helpful for backup cases (maybe you suggested?). Also, either way we >>> need to make a safe logic of acquring and releasing the source slot >>> for logical slots cases. Or you meant to restrict the case where the >>> copying a slot that doesn't reserve WAL? >> >> I mean the latter, as-known-as there is no actual point in being able to >> copy WAL which does *not* reserve WAL. > > Agreed. I'll restrict that case in the next version > >> >>>> Does it actually make sense to allow copy of temporary slots or change >>>> their persistence? Those don't live across sessions so they'd need to >>>> be copied in the same session which created them. >>> >>> I think the copying of temporary slots would be an impracticable >>> feature but the changing their persistence might be helpful for some >>> cases, especially copying from persistent to temporary. >> >> The session doing the copy of a permanent slot to the temporary slot has >> to be the one also consuming it as the session which created the slot >> owns it, and the slot would be dropped when the session ends. For >> logical slots perhaps you have something in mind? Like copying a slot >> which is not active to check where it is currently replaying, and using >> the copy for sanity checks? > > Yeah, I imagined such case. If we want to do backup/logical decoding > from the same point as the source permanent slot we might want to use > temporary slots so that it will be dropped surely after that. > Attached a new version patch incorporated the all comments I got. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi, On 31/08/2018 07:03, Masahiko Sawada wrote: > > Attached a new version patch incorporated the all comments I got. > This looks pretty reasonable. I am personally not big fan of the C wrappers for overloaded functions, but that's what we need to do for opr_sanity to pass so I guess we'll have to use them. The more serious thing is: > + if (MyReplicationSlot) > + ReplicationSlotRelease(); > + > + /* Release the saved slot if exist while preventing double releasing */ > + if (savedslot && savedslot != MyReplicationSlot) This won't work as intended as the ReplicationSlotRelease() will set MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot to yet another temp variable inside this function prior to releasing it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Nov 25, 2018 at 12:27 AM Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > > Hi, > > On 31/08/2018 07:03, Masahiko Sawada wrote: > > > > Attached a new version patch incorporated the all comments I got. > > > > This looks pretty reasonable. Thank you for looking at this patch. > > I am personally not big fan of the C wrappers for overloaded functions, > but that's what we need to do for opr_sanity to pass so I guess we'll > have to use them. > > The more serious thing is: > > > + if (MyReplicationSlot) > > + ReplicationSlotRelease(); > > + > > + /* Release the saved slot if exist while preventing double releasing */ > > + if (savedslot && savedslot != MyReplicationSlot) > > This won't work as intended as the ReplicationSlotRelease() will set > MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot > to yet another temp variable inside this function prior to releasing it. > You're right. I've fixed it by checking if we need to release the saved slot before releasing the origin slot. Is that right? Attached an updated patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi, On 26/11/2018 01:29, Masahiko Sawada wrote: > On Sun, Nov 25, 2018 at 12:27 AM Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> >> The more serious thing is: >> >>> + if (MyReplicationSlot) >>> + ReplicationSlotRelease(); >>> + >>> + /* Release the saved slot if exist while preventing double releasing */ >>> + if (savedslot && savedslot != MyReplicationSlot) >> >> This won't work as intended as the ReplicationSlotRelease() will set >> MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot >> to yet another temp variable inside this function prior to releasing it. >> > > You're right. I've fixed it by checking if we need to release the > saved slot before releasing the origin slot. Is that right? > Attached an updated patch. > Sounds good. I do have one more minor gripe after reading though again: > + > + /* > + * The requested wal lsn is no longer available. We don't want to retry > + * it, so raise an error. > + */ > + if (!XLogRecPtrIsInvalid(requested_lsn)) > + { > + char filename[MAXFNAMELEN]; > + > + XLogFileName(filename, ThisTimeLineID, segno, wal_segment_size); > + ereport(ERROR, > + (errmsg("could not reserve WAL segment %s", filename))); > + } I would reword the comment to something like "The caller has requested a specific wal lsn which we failed to reserve. We can't retry here as the requested wal is no longer available." (It took me a while to understand this part). Also the ereport should have errcode as it's going to be thrown to user sessions and it might be better if the error itself used same wording as CheckXLogRemoved() and XLogRead() for consistency. What do you think? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 27, 2018 at 3:46 AM Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > > Hi, > > On 26/11/2018 01:29, Masahiko Sawada wrote: > > On Sun, Nov 25, 2018 at 12:27 AM Petr Jelinek > > <petr.jelinek@2ndquadrant.com> wrote: > >> > >> The more serious thing is: > >> > >>> + if (MyReplicationSlot) > >>> + ReplicationSlotRelease(); > >>> + > >>> + /* Release the saved slot if exist while preventing double releasing */ > >>> + if (savedslot && savedslot != MyReplicationSlot) > >> > >> This won't work as intended as the ReplicationSlotRelease() will set > >> MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot > >> to yet another temp variable inside this function prior to releasing it. > >> > > > > You're right. I've fixed it by checking if we need to release the > > saved slot before releasing the origin slot. Is that right? > > Attached an updated patch. > > > > Sounds good. > > I do have one more minor gripe after reading though again: > Thank you for the review comment and sorry for the late response. > > + > > + /* > > + * The requested wal lsn is no longer available. We don't want to retry > > + * it, so raise an error. > > + */ > > + if (!XLogRecPtrIsInvalid(requested_lsn)) > > + { > > + char filename[MAXFNAMELEN]; > > + > > + XLogFileName(filename, ThisTimeLineID, segno, wal_segment_size); > > + ereport(ERROR, > > + (errmsg("could not reserve WAL segment %s", filename))); > > + } > > I would reword the comment to something like "The caller has requested a > specific wal lsn which we failed to reserve. We can't retry here as the > requested wal is no longer available." (It took me a while to understand > this part). > > Also the ereport should have errcode as it's going to be thrown to user > sessions and it might be better if the error itself used same wording as > CheckXLogRemoved() and XLogRead() for consistency. What do you think? > I agreed your both comments. I've changed the above comment and ereport. Attached the updated version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi, On 15/01/2019 02:56, Masahiko Sawada wrote: > On Tue, Nov 27, 2018 at 3:46 AM Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >>> + >>> + /* >>> + * The requested wal lsn is no longer available. We don't want to retry >>> + * it, so raise an error. >>> + */ >>> + if (!XLogRecPtrIsInvalid(requested_lsn)) >>> + { >>> + char filename[MAXFNAMELEN]; >>> + >>> + XLogFileName(filename, ThisTimeLineID, segno, wal_segment_size); >>> + ereport(ERROR, >>> + (errmsg("could not reserve WAL segment %s", filename))); >>> + } >> >> I would reword the comment to something like "The caller has requested a >> specific wal lsn which we failed to reserve. We can't retry here as the >> requested wal is no longer available." (It took me a while to understand >> this part). >> >> Also the ereport should have errcode as it's going to be thrown to user >> sessions and it might be better if the error itself used same wording as >> CheckXLogRemoved() and XLogRead() for consistency. What do you think? >> > > I agreed your both comments. I've changed the above comment and > ereport. Attached the updated version patch. > I went through this again and I am pretty much happy with the current version. So I am going to mark it as RFC. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019-01-15 10:56:04 +0900, Masahiko Sawada wrote: > + <primary>pg_copy_physical_replication_slot</primary> > + </indexterm> > + <literal><function>pg_copy_physical_replication_slot(<parameter>src_slot_name</parameter> <type>name</type>, <parameter>dst_slot_name</parameter><optional>, <parameter>temporary</parameter> <type>bool</type></optional>)</function></literal> > + </entry> > + <entry> > + (<parameter>slot_name</parameter> <type>name</type>, <parameter>lsn</parameter> <type>pg_lsn</type>) > + </entry> > + <entry> > + Copies an existing physical replication slot name <parameter>src_slot_name</parameter> > + to a physical replication slot named <parameter>dst_slot_name</parameter>. > + The copied physical slot starts to reserve WAL from the same <acronym>LSN</acronym> as the > + source slot. > + <parameter>temporary</parameter> is optional. If <parameter>temporary</parameter> > + is omitted, the same value as the source slot is used. > + </entry> > + </row> > + > + <row> > + <entry> > + <indexterm> > + <primary>pg_copy_logical_replication_slot</primary> > + </indexterm> > + <literal><function>pg_copy_logical_replication_slot(<parameter>src_slot_name</parameter> <type>name</type>, <parameter>dst_slot_name</parameter><optional>, <parameter>plugin</parameter> <type>name</type> <optional>, <parameter>temporary</parameter><type>boolean</type></optional></optional>)</function></literal> > + </entry> > + <entry> > + (<parameter>slot_name</parameter> <type>name</type>, <parameter>lsn</parameter> <type>pg_lsn</type>) > + </entry> > + <entry> > + Copies an existing logical replication slot name <parameter>src_slot_name</parameter> > + to a logical replication slot named <parameter>dst_slot_name</parameter> > + while changing the output plugin and persistence. The copied logical slot starts > + from the same <acronym>LSN</acronym> as the source logical slot. Both <parameter>plugin</parameter> and > + <parameter>temporary</parameter> are optional. If <parameter>plugin</parameter> > + or <parameter>temporary</parameter> are omitted, the same values as > + the source logical slot are used. > + </entry> > + </row> Would it make sense to move the differing options to the end of the argument list? Right now we have a few common params, then a different one, and then another common one? > @@ -271,7 +272,7 @@ CreateInitDecodingContext(char *plugin, > StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN); > SpinLockRelease(&slot->mutex); > > - ReplicationSlotReserveWal(); > + ReplicationSlotReserveWal(restart_lsn); Why do we even need to call this? It ought to be guaranteed that there's sufficient WAL, right? And somehow it seems harder to understand to me that the reserve routine gets an LSN. > /* > * Reserve WAL for the currently active slot. > * > - * Compute and set restart_lsn in a manner that's appropriate for the type of > - * the slot and concurrency safe. > + * If an lsn to reserve is not requested, compute and set restart_lsn > + * in a manner that's appropriate for the type of the slot and concurrency safe. > + * If the reseved WAL is requested, set restart_lsn and check if the corresponding > + * wal segment is available. > */ > void > -ReplicationSlotReserveWal(void) > +ReplicationSlotReserveWal(XLogRecPtr requested_lsn) > { > ReplicationSlot *slot = MyReplicationSlot; > > @@ -1005,47 +1007,57 @@ ReplicationSlotReserveWal(void) > * The replication slot mechanism is used to prevent removal of required > * WAL. As there is no interlock between this routine and checkpoints, WAL > * segments could concurrently be removed when a now stale return value of > - * ReplicationSlotsComputeRequiredLSN() is used. In the unlikely case that > - * this happens we'll just retry. > + * ReplicationSlotsComputeRequiredLSN() is used. If the lsn to reserve is > + * not requested, in the unlikely case that this happens we'll just retry. > */ > while (true) > { > XLogSegNo segno; > XLogRecPtr restart_lsn; > > - /* > - * For logical slots log a standby snapshot and start logical decoding > - * at exactly that position. That allows the slot to start up more > - * quickly. > - * > - * That's not needed (or indeed helpful) for physical slots as they'll > - * start replay at the last logged checkpoint anyway. Instead return > - * the location of the last redo LSN. While that slightly increases > - * the chance that we have to retry, it's where a base backup has to > - * start replay at. > - */ > - if (!RecoveryInProgress() && SlotIsLogical(slot)) > + if (!XLogRecPtrIsInvalid(requested_lsn)) > { > - XLogRecPtr flushptr; > - > - /* start at current insert position */ > - restart_lsn = GetXLogInsertRecPtr(); > + /* Set the requested lsn */ > SpinLockAcquire(&slot->mutex); > - slot->data.restart_lsn = restart_lsn; > + slot->data.restart_lsn = requested_lsn; > SpinLockRelease(&slot->mutex); > - > - /* make sure we have enough information to start */ > - flushptr = LogStandbySnapshot(); > - > - /* and make sure it's fsynced to disk */ > - XLogFlush(flushptr); > } > else > { > - restart_lsn = GetRedoRecPtr(); > - SpinLockAcquire(&slot->mutex); > - slot->data.restart_lsn = restart_lsn; > - SpinLockRelease(&slot->mutex); > + /* > + * For logical slots log a standby snapshot and start logical decoding > + * at exactly that position. That allows the slot to start up more > + * quickly. > + * > + * That's not needed (or indeed helpful) for physical slots as they'll > + * start replay at the last logged checkpoint anyway. Instead return > + * the location of the last redo LSN. While that slightly increases > + * the chance that we have to retry, it's where a base backup has to > + * start replay at. > + */ > + if (!RecoveryInProgress() && SlotIsLogical(slot)) > + { > + XLogRecPtr flushptr; > + > + /* start at current insert position */ > + restart_lsn = GetXLogInsertRecPtr(); > + SpinLockAcquire(&slot->mutex); > + slot->data.restart_lsn = restart_lsn; > + SpinLockRelease(&slot->mutex); > + > + /* make sure we have enough information to start */ > + flushptr = LogStandbySnapshot(); > + > + /* and make sure it's fsynced to disk */ > + XLogFlush(flushptr); > + } > + else > + { > + restart_lsn = GetRedoRecPtr(); > + SpinLockAcquire(&slot->mutex); > + slot->data.restart_lsn = restart_lsn; > + SpinLockRelease(&slot->mutex); > + } > } > > /* prevent WAL removal as fast as possible */ > @@ -1061,6 +1073,21 @@ ReplicationSlotReserveWal(void) > XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size); > if (XLogGetLastRemovedSegno() < segno) > break; This seems like it's harder to understand than before. The loop (and most of the rest of the function) doesn't make sense for the copy case, so I think it'd be better to just move this into a separate function that just verifies that all the WAL is there. > + /* > + * The caller has requested a specific wal which we failed to reserve. > + * We can't retry here as the requested wal is no longer available. > + */ > + if (!XLogRecPtrIsInvalid(requested_lsn)) > + { > + char filename[MAXFNAMELEN]; > + > + XLogFileName(filename, ThisTimeLineID, segno, wal_segment_size); > + ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_FILE), > + errmsg("requested WAL segment %s has already been removed", > + filename))); > + } > } > } This ought to be unreachable, right? > +/* > + * Copy physical replication slot (3 arguments) > + * > + * note: this wrapper is necessary to pass the sanity check in opr_sanity, > + * which checks that all built-in functions that share the implementing C > + * function take the same number of arguments > + */ > +Datum > +pg_copy_physical_replication_slot_no_temp(PG_FUNCTION_ARGS) > +{ > + return pg_copy_physical_replication_slot(fcinfo); > +} You could avoid this by just defining the wrapper on the SQL level, but I'm ok with this. > + /* > + * To prevent the restart_lsn WAL of the source slot from removal > + * during copying a new slot, we copy it while holding the source slot. > + * Since we are not allowed to create a new one while holding another > + * one, we temporarily save the acquired slot and restore it after > + * creation. Set callback function to ensure we release replication > + * slots if fail below. > + */ > if (immediately_reserve) > + saveslot = MyReplicationSlot; > + else > + ReplicationSlotRelease(); Yikes, this is mightily ugly. Stupid question, but couldn't we optimize this to something like: /* * First copy current data of the slot. Then install those in the * new slot. The src slot could have progressed while installing, * but the installed values prevent global horizons from progressing * further. Therefore a second copy is sufficiently up2date. */ SpinLockAcquire(&src->mutex); copy_lsn = src->data.restart_lsn; copy_xid = ...; SpinLockRelease(&src->mutex); /* install copied values */ SpinLockAcquire(&src->mutex); /* copy data of slot again */ SpinLockRelease(&src->mutex); /* install again */ ? Greetings, Andres Freund
On Sat, Feb 16, 2019 at 12:34 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > Thank you for your comment. > On 2019-01-15 10:56:04 +0900, Masahiko Sawada wrote: > > > + <primary>pg_copy_physical_replication_slot</primary> > > + </indexterm> > > + <literal><function>pg_copy_physical_replication_slot(<parameter>src_slot_name</parameter> <type>name</type>,<parameter>dst_slot_name</parameter> <optional>, <parameter>temporary</parameter> <type>bool</type></optional>)</function></literal> > > + </entry> > > + <entry> > > + (<parameter>slot_name</parameter> <type>name</type>, <parameter>lsn</parameter> <type>pg_lsn</type>) > > + </entry> > > + <entry> > > + Copies an existing physical replication slot name <parameter>src_slot_name</parameter> > > + to a physical replication slot named <parameter>dst_slot_name</parameter>. > > + The copied physical slot starts to reserve WAL from the same <acronym>LSN</acronym> as the > > + source slot. > > + <parameter>temporary</parameter> is optional. If <parameter>temporary</parameter> > > + is omitted, the same value as the source slot is used. > > + </entry> > > + </row> > > + > > + <row> > > + <entry> > > + <indexterm> > > + <primary>pg_copy_logical_replication_slot</primary> > > + </indexterm> > > + <literal><function>pg_copy_logical_replication_slot(<parameter>src_slot_name</parameter> <type>name</type>,<parameter>dst_slot_name</parameter> <optional>, <parameter>plugin</parameter> <type>name</type> <optional>,<parameter>temporary</parameter> <type>boolean</type></optional></optional>)</function></literal> > > + </entry> > > + <entry> > > + (<parameter>slot_name</parameter> <type>name</type>, <parameter>lsn</parameter> <type>pg_lsn</type>) > > + </entry> > > + <entry> > > + Copies an existing logical replication slot name <parameter>src_slot_name</parameter> > > + to a logical replication slot named <parameter>dst_slot_name</parameter> > > + while changing the output plugin and persistence. The copied logical slot starts > > + from the same <acronym>LSN</acronym> as the source logical slot. Both <parameter>plugin</parameter> and > > + <parameter>temporary</parameter> are optional. If <parameter>plugin</parameter> > > + or <parameter>temporary</parameter> are omitted, the same values as > > + the source logical slot are used. > > + </entry> > > + </row> > > Would it make sense to move the differing options to the end of the > argument list? Right now we have a few common params, then a different > one, and then another common one? Agreed, will fix. > > > > @@ -271,7 +272,7 @@ CreateInitDecodingContext(char *plugin, > > StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN); > > SpinLockRelease(&slot->mutex); > > > > - ReplicationSlotReserveWal(); > > + ReplicationSlotReserveWal(restart_lsn); > > Why do we even need to call this? It ought to be guaranteed that there's > sufficient WAL, right? And somehow it seems harder to understand to me > that the reserve routine gets an LSN. That's right in copy function cases. I'll change it so that CreateInitDecodingContext() sets the start lsn without WAL reservation routine if the passed-in restart_lsn is a valid value. So the caller must guarantee that the lsn is available. > > > > /* > > * Reserve WAL for the currently active slot. > > * > > - * Compute and set restart_lsn in a manner that's appropriate for the type of > > - * the slot and concurrency safe. > > + * If an lsn to reserve is not requested, compute and set restart_lsn > > + * in a manner that's appropriate for the type of the slot and concurrency safe. > > + * If the reseved WAL is requested, set restart_lsn and check if the corresponding > > + * wal segment is available. > > */ > > void > > -ReplicationSlotReserveWal(void) > > +ReplicationSlotReserveWal(XLogRecPtr requested_lsn) > > { > > ReplicationSlot *slot = MyReplicationSlot; > > > > @@ -1005,47 +1007,57 @@ ReplicationSlotReserveWal(void) > > * The replication slot mechanism is used to prevent removal of required > > * WAL. As there is no interlock between this routine and checkpoints, WAL > > * segments could concurrently be removed when a now stale return value of > > - * ReplicationSlotsComputeRequiredLSN() is used. In the unlikely case that > > - * this happens we'll just retry. > > + * ReplicationSlotsComputeRequiredLSN() is used. If the lsn to reserve is > > + * not requested, in the unlikely case that this happens we'll just retry. > > */ > > while (true) > > { > > XLogSegNo segno; > > XLogRecPtr restart_lsn; > > > > - /* > > - * For logical slots log a standby snapshot and start logical decoding > > - * at exactly that position. That allows the slot to start up more > > - * quickly. > > - * > > - * That's not needed (or indeed helpful) for physical slots as they'll > > - * start replay at the last logged checkpoint anyway. Instead return > > - * the location of the last redo LSN. While that slightly increases > > - * the chance that we have to retry, it's where a base backup has to > > - * start replay at. > > - */ > > - if (!RecoveryInProgress() && SlotIsLogical(slot)) > > + if (!XLogRecPtrIsInvalid(requested_lsn)) > > { > > - XLogRecPtr flushptr; > > - > > - /* start at current insert position */ > > - restart_lsn = GetXLogInsertRecPtr(); > > + /* Set the requested lsn */ > > SpinLockAcquire(&slot->mutex); > > - slot->data.restart_lsn = restart_lsn; > > + slot->data.restart_lsn = requested_lsn; > > SpinLockRelease(&slot->mutex); > > - > > - /* make sure we have enough information to start */ > > - flushptr = LogStandbySnapshot(); > > - > > - /* and make sure it's fsynced to disk */ > > - XLogFlush(flushptr); > > } > > else > > { > > - restart_lsn = GetRedoRecPtr(); > > - SpinLockAcquire(&slot->mutex); > > - slot->data.restart_lsn = restart_lsn; > > - SpinLockRelease(&slot->mutex); > > + /* > > + * For logical slots log a standby snapshot and start logical decoding > > + * at exactly that position. That allows the slot to start up more > > + * quickly. > > + * > > + * That's not needed (or indeed helpful) for physical slots as they'll > > + * start replay at the last logged checkpoint anyway. Instead return > > + * the location of the last redo LSN. While that slightly increases > > + * the chance that we have to retry, it's where a base backup has to > > + * start replay at. > > + */ > > + if (!RecoveryInProgress() && SlotIsLogical(slot)) > > + { > > + XLogRecPtr flushptr; > > + > > + /* start at current insert position */ > > + restart_lsn = GetXLogInsertRecPtr(); > > + SpinLockAcquire(&slot->mutex); > > + slot->data.restart_lsn = restart_lsn; > > + SpinLockRelease(&slot->mutex); > > + > > + /* make sure we have enough information to start */ > > + flushptr = LogStandbySnapshot(); > > + > > + /* and make sure it's fsynced to disk */ > > + XLogFlush(flushptr); > > + } > > + else > > + { > > + restart_lsn = GetRedoRecPtr(); > > + SpinLockAcquire(&slot->mutex); > > + slot->data.restart_lsn = restart_lsn; > > + SpinLockRelease(&slot->mutex); > > + } > > } > > > > /* prevent WAL removal as fast as possible */ > > @@ -1061,6 +1073,21 @@ ReplicationSlotReserveWal(void) > > XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size); > > if (XLogGetLastRemovedSegno() < segno) > > break; > > This seems like it's harder to understand than before. The loop (and > most of the rest of the function) doesn't make sense for the copy case, > so I think it'd be better to just move this into a separate function > that just verifies that all the WAL is there. Agreed, will fix. > > > > + /* > > + * The caller has requested a specific wal which we failed to reserve. > > + * We can't retry here as the requested wal is no longer available. > > + */ > > + if (!XLogRecPtrIsInvalid(requested_lsn)) > > + { > > + char filename[MAXFNAMELEN]; > > + > > + XLogFileName(filename, ThisTimeLineID, segno, wal_segment_size); > > + ereport(ERROR, > > + (errcode(ERRCODE_UNDEFINED_FILE), > > + errmsg("requested WAL segment %s has already been removed", > > + filename))); > > + } > > } > > } > > This ought to be unreachable, right? Right. > > > > > > +/* > > + * Copy physical replication slot (3 arguments) > > + * > > + * note: this wrapper is necessary to pass the sanity check in opr_sanity, > > + * which checks that all built-in functions that share the implementing C > > + * function take the same number of arguments > > + */ > > +Datum > > +pg_copy_physical_replication_slot_no_temp(PG_FUNCTION_ARGS) > > +{ > > + return pg_copy_physical_replication_slot(fcinfo); > > +} > > You could avoid this by just defining the wrapper on the SQL level, but > I'm ok with this. > > > > + /* > > + * To prevent the restart_lsn WAL of the source slot from removal > > + * during copying a new slot, we copy it while holding the source slot. > > + * Since we are not allowed to create a new one while holding another > > + * one, we temporarily save the acquired slot and restore it after > > + * creation. Set callback function to ensure we release replication > > + * slots if fail below. > > + */ > > if (immediately_reserve) > > + saveslot = MyReplicationSlot; > > + else > > + ReplicationSlotRelease(); > > Yikes, this is mightily ugly. > > Stupid question, but couldn't we optimize this to something like: > > /* > * First copy current data of the slot. Then install those in the > * new slot. The src slot could have progressed while installing, > * but the installed values prevent global horizons from progressing > * further. Therefore a second copy is sufficiently up2date. > */ > SpinLockAcquire(&src->mutex); > copy_lsn = src->data.restart_lsn; > copy_xid = ...; > SpinLockRelease(&src->mutex); > > /* install copied values */ > > > SpinLockAcquire(&src->mutex); > /* copy data of slot again */ > SpinLockRelease(&src->mutex); > > /* install again */ > > ? With this optimization since we don't need to acquire the source slot we can copy even from a slot that has already been acquired by someone, which is great. However is it possible that once released the first spinlock of the source slot it could be dropped and the global horizons can progress before installing the copied values? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On 2019-02-18 16:57:07 +0900, Masahiko Sawada wrote: > > Stupid question, but couldn't we optimize this to something like: > > > > /* > > * First copy current data of the slot. Then install those in the > > * new slot. The src slot could have progressed while installing, > > * but the installed values prevent global horizons from progressing > > * further. Therefore a second copy is sufficiently up2date. > > */ > > SpinLockAcquire(&src->mutex); > > copy_lsn = src->data.restart_lsn; > > copy_xid = ...; > > SpinLockRelease(&src->mutex); > > > > /* install copied values */ > > > > > > SpinLockAcquire(&src->mutex); > > /* copy data of slot again */ > > SpinLockRelease(&src->mutex); > > > > /* install again */ > > > > ? > > With this optimization since we don't need to acquire the source slot > we can copy even from a slot that has already been acquired by > someone, which is great. However is it possible that once released the > first spinlock of the source slot it could be dropped and the global > horizons can progress before installing the copied values? Well, I'd not thought we'd do it without acquiring the other slot. But that still seems to be easy enough to address, we just need to recheck whether the slot still exists (with the right name) the second time we acquire the spinlock? Greetings, Andres Freund
On Tue, Feb 19, 2019 at 1:28 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-02-18 16:57:07 +0900, Masahiko Sawada wrote: > > > Stupid question, but couldn't we optimize this to something like: > > > > > > /* > > > * First copy current data of the slot. Then install those in the > > > * new slot. The src slot could have progressed while installing, > > > * but the installed values prevent global horizons from progressing > > > * further. Therefore a second copy is sufficiently up2date. > > > */ > > > SpinLockAcquire(&src->mutex); > > > copy_lsn = src->data.restart_lsn; > > > copy_xid = ...; > > > SpinLockRelease(&src->mutex); > > > > > > /* install copied values */ > > > > > > > > > SpinLockAcquire(&src->mutex); > > > /* copy data of slot again */ > > > SpinLockRelease(&src->mutex); > > > > > > /* install again */ > > > > > > ? > > > > With this optimization since we don't need to acquire the source slot > > we can copy even from a slot that has already been acquired by > > someone, which is great. However is it possible that once released the > > first spinlock of the source slot it could be dropped and the global > > horizons can progress before installing the copied values? > > Well, I'd not thought we'd do it without acquiring the other slot. But > that still seems to be easy enough to address, we just need to recheck > whether the slot still exists (with the right name) the second time we > acquire the spinlock? Yeah, I think that would work. The attached patch takes this direction. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Feb 19, 2019 at 05:09:33PM +0900, Masahiko Sawada wrote: > On Tue, Feb 19, 2019 at 1:28 AM Andres Freund <andres@anarazel.de> wrote: >> Well, I'd not thought we'd do it without acquiring the other slot. But >> that still seems to be easy enough to address, we just need to recheck >> whether the slot still exists (with the right name) the second time we >> acquire the spinlock? > > Yeah, I think that would work. The attached patch takes this > direction. Please review it. + if (XLogRecPtrIsInvalid(copy_restart_lsn) || + copy_restart_lsn < src_restart_lsn || + src_islogical != copy_islogical || + strcmp(copy_name, NameStr(*src_name)) != 0) + ereport(ERROR, + (errmsg("could not copy logical replication slot \"%s\"", + NameStr(*src_name)), + errdetail("The source replication slot has been dropped during copy"))); + + /* Install copied values again */ + SpinLockAcquire(&MyReplicationSlot->mutex); Worth worrying about this window not reduced to zero? If the slot is dropped between both then the same issue would arise. -- Michael
Attachment
On Wed, Feb 20, 2019 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Feb 19, 2019 at 05:09:33PM +0900, Masahiko Sawada wrote: > > On Tue, Feb 19, 2019 at 1:28 AM Andres Freund <andres@anarazel.de> wrote: > >> Well, I'd not thought we'd do it without acquiring the other slot. But > >> that still seems to be easy enough to address, we just need to recheck > >> whether the slot still exists (with the right name) the second time we > >> acquire the spinlock? > > > > Yeah, I think that would work. The attached patch takes this > > direction. Please review it. > > + if (XLogRecPtrIsInvalid(copy_restart_lsn) || > + copy_restart_lsn < src_restart_lsn || > + src_islogical != copy_islogical || > + strcmp(copy_name, NameStr(*src_name)) != 0) > + ereport(ERROR, > + (errmsg("could not copy logical replication slot \"%s\"", > + NameStr(*src_name)), > + errdetail("The source replication slot has been dropped during copy"))); > + > + /* Install copied values again */ > + SpinLockAcquire(&MyReplicationSlot->mutex); > > Worth worrying about this window not reduced to zero? If the slot is > dropped between both then the same issue would arise. You meant that the destination slot (i.e. MyReplicationSlot) could be dropped before acquiring its lock? Since we're holding the new slot it cannot be dropped. BTW, XLogRecPtrIsInvalid(copy_restart_lsn) || copy_restart_lsn < src_restart_lsn is redundant, the former should be removed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Feb 20, 2019 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > BTW, XLogRecPtrIsInvalid(copy_restart_lsn) || copy_restart_lsn < > src_restart_lsn is redundant, the former should be removed. > So attached the updated version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Feb 25, 2019 at 4:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Feb 20, 2019 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > BTW, XLogRecPtrIsInvalid(copy_restart_lsn) || copy_restart_lsn < > > src_restart_lsn is redundant, the former should be removed. > > > > So attached the updated version patch. > Since the patch failed to build, attached the updated version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On 3/11/19 5:16 AM, Masahiko Sawada wrote: > On Mon, Feb 25, 2019 at 4:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Wed, Feb 20, 2019 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >>> BTW, XLogRecPtrIsInvalid(copy_restart_lsn) || copy_restart_lsn < >>> src_restart_lsn is redundant, the former should be removed. >>> >> >> So attached the updated version patch. >> > > Since the patch failed to build, attached the updated version patch. This patch is failing testing so marked Waiting on Author. Regards, -- -David david@pgmasters.net
On Mon, Mar 25, 2019 at 5:26 PM David Steele <david@pgmasters.net> wrote: > > On 3/11/19 5:16 AM, Masahiko Sawada wrote: > > On Mon, Feb 25, 2019 at 4:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > >> On Wed, Feb 20, 2019 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> > >>> BTW, XLogRecPtrIsInvalid(copy_restart_lsn) || copy_restart_lsn < > >>> src_restart_lsn is redundant, the former should be removed. > >>> > >> > >> So attached the updated version patch. > >> > > > > Since the patch failed to build, attached the updated version patch. > > This patch is failing testing so marked Waiting on Author. > Thank you! OIDs for new replication copy function are conflicted with newly added jsonpath functions. I've attached the updated patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Pushed this. The mechanism of creating a new slot from the source, then later advancing the LSN of the new slot using the updated values from the source slot, seems quite clever. I reworded the comment that explained how it is supposed to work; please double-check to ensure I got it right. I renamed the opr_sanity-induced wrappers. I see no reason to get very precise about what's what ... these functions are all identical. Thanks everyone! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/04/2019 23:16, Alvaro Herrera wrote: > Pushed this. > Thanks! > The mechanism of creating a new slot from the source, then later > advancing the LSN of the new slot using the updated values from the > source slot, seems quite clever. I reworded the comment that explained > how it is supposed to work; please double-check to ensure I got it > right. > Besides the "We regards it" typo it looks fine to me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 6, 2019 at 6:16 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Pushed this. Thank you! > > The mechanism of creating a new slot from the source, then later > advancing the LSN of the new slot using the updated values from the > source slot, seems quite clever. I reworded the comment that explained > how it is supposed to work; please double-check to ensure I got it > right. It looks good to me other than a typo pointed out by Petr. > > I renamed the opr_sanity-induced wrappers. I see no reason to get > very precise about what's what ... these functions are all identical. > Agreed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2019-Apr-05, Petr Jelinek wrote: > On 05/04/2019 23:16, Alvaro Herrera wrote: > > The mechanism of creating a new slot from the source, then later > > advancing the LSN of the new slot using the updated values from the > > source slot, seems quite clever. I reworded the comment that explained > > how it is supposed to work; please double-check to ensure I got it > > right. > > Besides the "We regards it" typo it looks fine to me. Doh! Thanks, fixed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services