Re: State of pg_createsubscriber - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: State of pg_createsubscriber
Date
Msg-id CAA4eK1+32Loq_STwEb3S0mChuz_p4Ts+8A+1w8vEcXU_O5dqKw@mail.gmail.com
Whole thread Raw
In response to Re: State of pg_createsubscriber  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: State of pg_createsubscriber
List pgsql-hackers
On Tue, Jun 18, 2024 at 12:41 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Jun 18, 2024, at 12:59 AM, Amit Kapila wrote:
>
> On Mon, May 20, 2024 at 12:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, May 19, 2024 at 11:20 PM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Sun, May 19, 2024, at 2:30 PM, Tom Lane wrote:
> > >
> > > I'm fairly disturbed about the readiness of pg_createsubscriber.
> > > The 040_pg_createsubscriber.pl TAP test is moderately unstable
> > > in the buildfarm [1], and there are various unaddressed complaints
> > > at the end of the patch thread (pretty much everything below [2]).
> > > I guess this is good enough to start beta with, but it's far from
> > > being good enough to ship, IMO.  If there were active work going
> > > on to fix these things, I'd feel better, but neither the C code
> > > nor the test script have been touched since 1 April.
> > >
> > >
> > > My bad. :( I'll post patches soon to address all of the points.
> > >
> >
> > Just to summarize, apart from BF failures for which we had some
> > discussion, I could recall the following open points:
> >
> > 1. After promotion, the pre-existing replication objects should be
> > removed (either optionally or always), otherwise, it can lead to a new
> > subscriber not being able to restart or getting some unwarranted data.
> > [1][2].
> >
> > 2. Retaining synced slots on new subscribers can lead to unnecessary
> > WAL retention and dead rows [3].
> >
> > 3. We need to consider whether some of the messages displayed in
> > --dry-run mode are useful or not [4].
> >
>
> The recent commits should silence BF failures and resolve point number
> 2. But we haven't done anything yet for 1 and 3. For 3, we have a
> patch in email [1] (v3-0005-Avoid*) which can be reviewed and
> committed but point number 1 needs discussion. Apart from that
> somewhat related to point 1, Kuroda-San has raised a point in an email
> [2] for replication slots. Shlok has presented a case in this thread
> [3] where the problem due to point 1 can cause ERRORs or can cause
> data inconsistency.
>
>
> I read v3-0005 and it seems to silence (almost) all "write" messages. Does it
> intend to avoid the misinterpretation that the dry run mode is writing
> something? It is dry run mode! If I run a tool in dry run mode, I expect it to
> execute some verifications and print useful messages so I can evaluate if it is
> ok to run it. Maybe it is not your expectation for dry run mode.
>

I haven't studied the patch so can't comment but the intention was to
not print some unrelated write messages. I have shared my observation
in the email [1].

> I think if it not clear, let's inform that it changes nothing in dry run mode.
>
> pg_createsubscriber: no modifications are done
>
> as a first message in dry run mode. I agree with you when you pointed out that
> some messages are misleading.
>
> pg_createsubscriber: hint: If pg_createsubscriber fails after this
> point, you must recreate the physical replica before continuing.
>
> Maybe for this one, we omit the fake information, like:
>
> pg_createsubscriber: setting the replication progress on database "postgres"
>

I think we don't need to display this message as we are not going to
do anything for this in the --dry-run mode. We can even move the
related code in (!dry_run) check.

> I will post a patch to address the messages once we agree what needs to be
> changed.
>

I suggest we can start a new thread with the messages shared in the
email [1] and your response for each one of those.

> Regarding 3, publications and subscriptions are ok to remove. You are not
> allowed to create them on standby, hence, all publications and subscriptions
> are streamed from primary. However, I'm wondering if you want to remove the
> publications.
>

I am not so sure of publications but we should remove subscriptions as
there are clear problems with those as shown by Shlok in this thread.

> Replication slots on a standby server are "invalidated" despite
> of the wal_status is saying "reserved" (I think it is an oversight in the
> design that implements slot invalidation), however, required WAL files have
> already been removed because of pg_resetwal (see modify_subscriber_sysid()).
> The scenario is to promote a standby server, run pg_resetwal on it and check
> pg_replication_slots.
>

Ideally, invalidated slots shouldn't create any problems but it is
better that we discuss this also as a separate problem in new thread.

> Do you have any other scenarios in mind?
>

No, so we have three issues to discuss (a) some unwarranted messages
in --dry-run mode; (b) whether to remove pre-existing subscriptions
during conversion; (c) whether to remove pre-existing replication
slots.

Would you like to start three new threads for each of these or would
you like Kuroda-San or me to start some or all?

[1] - https://www.postgresql.org/message-id/CAA4eK1J2fAvsJ2HihbWJ_GxETd6sdqSMrZdCVJEutRZRpm1MEQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: Document NULL
Next
From: Amit Kapila
Date:
Subject: Re: State of pg_createsubscriber