Thread: Re: Separate GUC for replication origins

Re: Separate GUC for replication origins

From
Peter Eisentraut
Date:
On 10.12.24 19:41, Euler Taveira wrote:
> I'm attaching a patch that adds max_replication_origins. It basically 
> replaces
> all of the points that refers to max_replication_slots on the subscriber. It
> uses the same default value as max_replication_slots (10). I did nothing to
> keep the backward compatibility. I mean has a special value (like -1) that
> means same value as max_replication_slots. Is it worth it? (If not, it 
> should
> be mentioned in the commit message.)

I think some backward compatibility tweak like that would be useful. 
Otherwise, the net effect of this is that most people will have to do 
one more thing than before to keep their systems working.  Also, all 
deployment and automation scripts would have to be updated for this.




Re: Separate GUC for replication origins

From
Masahiko Sawada
Date:
On Wed, Jan 8, 2025 at 7:39 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Dec 19, 2024, at 10:31 AM, Peter Eisentraut wrote:
>
> On 10.12.24 19:41, Euler Taveira wrote:
> > I'm attaching a patch that adds max_replication_origins. It basically
> > replaces
> > all of the points that refers to max_replication_slots on the subscriber. It
> > uses the same default value as max_replication_slots (10). I did nothing to
> > keep the backward compatibility. I mean has a special value (like -1) that
> > means same value as max_replication_slots. Is it worth it? (If not, it
> > should
> > be mentioned in the commit message.)
>
> I think some backward compatibility tweak like that would be useful.
> Otherwise, the net effect of this is that most people will have to do
> one more thing than before to keep their systems working.  Also, all
> deployment and automation scripts would have to be updated for this.
>
> This new patch includes the special value (-1) that uses max_replication_slots
> instead.

Thank you for working on this. The proposed idea makes sense to me.

Here are some comments on v2 patch:

---
        /* Report this after the initial starting message for consistency. */
-       if (max_replication_slots == 0)
+       if (max_replication_origins == 0)
                ereport(ERROR,
                                (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
                                 errmsg("cannot start logical
replication workers when \"max_replication_slots\"=0")));

Need to update the error message too.

---
+               {"max_replication_origins",
+                       PGC_POSTMASTER,
+                       REPLICATION_SUBSCRIBERS,
+                       gettext_noop("Sets the maximum number of
simultaneously defined replication origins."),
+                       NULL
+               },

I think the description is not accurate; this GUC controls the maximum
number of simultaneous replication origins that can be setup.

---
Given that max_replication_origins doesn't control the maximum number
of replication origins that can be defined, probably we need to find a
better name. As Kuroda-san already mentioned some proposed names,
max_tracked_replication_origins or max_replication_origin_states seem
reasonable to me.

---
+#include "utils/guc_hooks.h"

I think #include'ing guc.h would be more appropriate.

------
        ereport(PANIC,
                (errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("replication slot checkpoint has wrong
checksum %u, expected %u",
                        crc, file_crc)));

I don't understand why we use the term "replication slot checkpoint"
in an error message for the replication origin code. It could be fixed
in a separate patch for back-patching.

---
It's not related to the proposed patch but I found a suspicious
behavior; when the server startup we try to restore the
replorigin_checkpoint file only when max_replication_slots (or
max_replication_origins with the patch) > 0. Therefore, we don't get
the error message "could not find free replication state, increase
\"max_replication_slots\"" when it's set to 0, even if we have some
replication states in the replorigin_checkpoint file. Which seems
quite confusing.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com