Re: Separate GUC for replication origins - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Separate GUC for replication origins
Date
Msg-id CAD21AoD-TqY1N8Vibnzr81ZVuQOjVDY0jF0zQi_e1_8oNLzieQ@mail.gmail.com
Whole thread Raw
In response to Re: Separate GUC for replication origins  (Peter Eisentraut <peter@eisentraut.org>)
List pgsql-hackers
On Wed, Jan 8, 2025 at 7:39 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Dec 19, 2024, at 10:31 AM, Peter Eisentraut wrote:
>
> On 10.12.24 19:41, Euler Taveira wrote:
> > I'm attaching a patch that adds max_replication_origins. It basically
> > replaces
> > all of the points that refers to max_replication_slots on the subscriber. It
> > uses the same default value as max_replication_slots (10). I did nothing to
> > keep the backward compatibility. I mean has a special value (like -1) that
> > means same value as max_replication_slots. Is it worth it? (If not, it
> > should
> > be mentioned in the commit message.)
>
> I think some backward compatibility tweak like that would be useful.
> Otherwise, the net effect of this is that most people will have to do
> one more thing than before to keep their systems working.  Also, all
> deployment and automation scripts would have to be updated for this.
>
> This new patch includes the special value (-1) that uses max_replication_slots
> instead.

Thank you for working on this. The proposed idea makes sense to me.

Here are some comments on v2 patch:

---
        /* Report this after the initial starting message for consistency. */
-       if (max_replication_slots == 0)
+       if (max_replication_origins == 0)
                ereport(ERROR,
                                (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
                                 errmsg("cannot start logical
replication workers when \"max_replication_slots\"=0")));

Need to update the error message too.

---
+               {"max_replication_origins",
+                       PGC_POSTMASTER,
+                       REPLICATION_SUBSCRIBERS,
+                       gettext_noop("Sets the maximum number of
simultaneously defined replication origins."),
+                       NULL
+               },

I think the description is not accurate; this GUC controls the maximum
number of simultaneous replication origins that can be setup.

---
Given that max_replication_origins doesn't control the maximum number
of replication origins that can be defined, probably we need to find a
better name. As Kuroda-san already mentioned some proposed names,
max_tracked_replication_origins or max_replication_origin_states seem
reasonable to me.

---
+#include "utils/guc_hooks.h"

I think #include'ing guc.h would be more appropriate.

------
        ereport(PANIC,
                (errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("replication slot checkpoint has wrong
checksum %u, expected %u",
                        crc, file_crc)));

I don't understand why we use the term "replication slot checkpoint"
in an error message for the replication origin code. It could be fixed
in a separate patch for back-patching.

---
It's not related to the proposed patch but I found a suspicious
behavior; when the server startup we try to restore the
replorigin_checkpoint file only when max_replication_slots (or
max_replication_origins with the patch) > 0. Therefore, we don't get
the error message "could not find free replication state, increase
\"max_replication_slots\"" when it's set to 0, even if we have some
replication states in the replorigin_checkpoint file. Which seems
quite confusing.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robins Tharakan
Date:
Subject: Re: Convert sepgsql tests to TAP
Next
From: Andrew Dunstan
Date:
Subject: Re: Convert sepgsql tests to TAP