Re: Copy function for logical replication slots - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Copy function for logical replication slots
Date
Msg-id CAD21AoDPwxBEyv1d7dvihY=bQ9zqFdputbkfOidMCe60T6+nfA@mail.gmail.com
Whole thread Raw
In response to Re: Copy function for logical replication slots  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Copy function for logical replication slots
List pgsql-hackers
On Mon, Aug 20, 2018 at 2:53 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Aug 14, 2018 at 01:38:23PM +0900, Masahiko Sawada wrote:
>> On Thu, Jul 12, 2018 at 9:28 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Attached new version of patch incorporated the all comments I got from
>>> Michael-san.
>>>
>>> To prevent the WAL segment file of restart_lsn of the origin slot from
>>> removal during creating the target slot, I've chosen a way to copy new
>>> one while holding the origin one. One problem to implement this way is
>>> that the current replication slot code doesn't allow us to do
>>> straightforwardly; the code assumes that the process creating a new
>>> slot is not holding another slot. So I've changed the copy functions
>>> so that it save temporarily MyReplicationSlot and then restore and
>>> release it after creation the target slot. To ensure that both the
>>> origin and target slot are released properly in failure cases I used
>>> PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of
>>> the logical decoding at a minimum. I've thought that we can change the
>>> logical decoding code so that it can assumes that the process can have
>>> more than one slots at once but it seems overkill to me.
>
> Yeah, we may be able to live with this trick.  For other processes, the
> process doing the copy would be seen as holding the slot so the
> checkpointer would not advance the oldest LSN to retain.
>
>> The previous patch conflicts with the current HEAD. Attached updated
>> version patch.

Thank you for reviewing this patch.

>
> +-- Now we have maximum 8 replication slots. Check slots are properly
> +-- released even when raise error during creating the target slot.
> +SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1',
> 'failed'); -- error
> +ERROR:  all replication slots are in use
>
> installcheck is going to fail on an instance which does not use exactly
> max_replication_slots = 8.  That lacks flexibility, and you could have
> the same coverage by copying, then immediately dropping each new slot.

Will fix.

> +        to a physical replicaiton slot named <parameter>dst_slot_name</parameter>.
> s/replicaiton/replicaton/.
>
> +        The copied physical slot starts to reserve WAL from the same
> <acronym>LSN</acronym> as the
> +        source slot if the source slot already reserves WAL.
> Or restricting this case?  In what is it useful to allow a copy from a
> slot which has done nothing yet?  This would also simplify the acquire
> and release logic of the source slot.
>

I think the copying from a slot that already reserved WAL would be
helpful for backup cases (maybe you suggested?). Also, either way we
need to make a safe logic of acquring and releasing the source slot
for logical slots cases. Or you meant  to restrict the case where the
copying a slot that doesn't reserve WAL?

> +   /* Check type of replication slot */
> +   if (SlotIsLogical(MyReplicationSlot))
> +   {
> +       ReplicationSlotRelease();
> +       ereport(ERROR,
> +               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                (errmsg("cannot copy a logical replication slot to a
> physical replication slot"))));
> +   }
> No need to call ReplicationSlotRelease for an ERROR code path.
>

Will fix.


> Does it actually make sense to allow copy of temporary slots or change
> their persistence?  Those don't live across sessions so they'd need to
> be copied in the same session which created them.

I think the copying of temporary slots would be an impracticable
feature but the changing their persistence might be helpful for some
cases, especially copying from persistent to temporary.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: some more error location support
Next
From: Andres Freund
Date:
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662":read only 0 of 8192 bytes