Home > mailing lists

Re: speed up a logical replica setup - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: speed up a logical replica setup
Date	March 11, 2024 04:12:06
Msg-id	CALDaNm3Fg4zJ9RxnQPSW7FFqy_K605urNM_B3zAaO84Dr7_zcA@mail.gmail.com Whole thread
In response to	Re: speed up a logical replica setup (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	RE: speed up a logical replica setup Re: speed up a logical replica setup
List	pgsql-hackers

Tree view

On Sat, 9 Mar 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> > Dear Tomas, Euler,
> >
> > Thanks for starting to read the thread! Since I'm not an original author,
> > I want to reply partially.
> >
> >> I decided to take a quick look on this patch today, to see how it works
> >> and do some simple tests. I've only started to get familiar with it, so
> >> I have only some comments / questions regarding usage, not on the code.
> >> It's quite possible I didn't understand some finer points, or maybe it
> >> was already discussed earlier in this very long thread, so please feel
> >> free to push back or point me to the past discussion.
> >>
> >> Also, some of this is rather opinionated, but considering I didn't see
> >> this patch before, my opinions may easily be wrong ...
> >
> > I felt your comments were quit valuable.
> >
> >> 1) SGML docs
> >>
> >> It seems the SGML docs are more about explaining how this works on the
> >> inside, rather than how to use the tool. Maybe that's intentional, but
> >> as someone who didn't work with pg_createsubscriber before I found it
> >> confusing and not very helpful.
> >>
> >> For example, the first half of the page is prerequisities+warning, and
> >> sure those are useful details, but prerequisities are checked by the
> >> tool (so I can't really miss this) and warnings go into a lot of details
> >> about different places where things may go wrong. Sure, worth knowing
> >> and including in the docs, but maybe not right at the beginning, before
> >> I learn how to even run the tool?
> >
> > Hmm, right. I considered below improvements. Tomas and Euler, how do you think?
> >
> > * Adds more descriptions in "Description" section.
> > * Moves prerequisities+warning to "Notes" section.
> > * Adds "Usage" section which describes from a single node.
> >
> >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> >> sure it won't work for a number of use cases. I know large databases
> >> it's common to create "work tables" (not necessarily temporary) as part
> >> of a batch job, but there's no need to replicate those tables.
> >
> > Indeed, the documentation does not describe that all tables in the database
> > would be included in the publication.
> >
> >> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> >> it may be an acceptable limitation, but maybe it'd be good to also
> >> support restricting which tables should be replicated (e.g. blacklist or
> >> whitelist based on table/schema name?).
> >
> > May not directly related, but we considered that accepting options was a next-step [1].
> >
> >> Note: I now realize this might fall under the warning about DDL, which
> >> says this:
> >>
> >>     Executing DDL commands on the source server while running
> >>     pg_createsubscriber is not recommended. If the target server has
> >>     already been converted to logical replica, the DDL commands must
> >>     not be replicated so an error would occur.
> >
> > Yeah, they would not be replicated, but not lead ERROR.
> > So should we say like "Creating tables on the source server..."?
> >
>
> Perhaps. Clarifying the docs would help, but it depends on the wording.
> For example, I doubt this should talk about "creating tables" because
> there are other DDL that (probably) could cause issues (like adding a
> column to the table, or something like that).
>
> >> 5) slot / publication / subscription name
> >>
> >> I find it somewhat annoying it's not possible to specify names for
> >> objects created by the tool - replication slots, publication and
> >> subscriptions. If this is meant to be a replica running for a while,
> >> after a while I'll have no idea what pg_createsubscriber_569853 or
> >> pg_createsubscriber_459548_2348239 was meant for.
> >>
> >> This is particularly annoying because renaming these objects later is
> >> either not supported at all (e.g. for replication slots), or may be
> >> quite difficult (e.g. publications).
> >>
> >> I do realize there are challenges with custom names (say, if there are
> >> multiple databases to replicate), but can't we support some simple
> >> formatting with basic placeholders? So we could specify
> >>
> >> --slot-name "myslot_%d_%p"
> >>
> >> or something like that?
> >
> > Not sure we can do in the first version, but looks nice. One concern is that I
> > cannot find applications which accepts escape strings like log_line_prefix.
> > (It may just because we do not have use-case.) Do you know examples?
> >
>
> I can't think of a tool already doing that, but I think that's simply
> because it was not needed. Why should we be concerned about this?
>
> >> BTW what will happen if we convert multiple standbys? Can't they all get
> >> the same slot name (they all have the same database OID, and I'm not
> >> sure how much entropy the PID has)?
> >
> > I tested and the second try did not work. The primal reason was the name of publication
> > - pg_createsubscriber_%u (oid).
> > FYI - previously we can reuse same publications, but based on my comment [2] the
> > feature was removed. It might be too optimistic.
> >
>
> OK. I could be convinced the other limitations are reasonable for v1 and
> can be improved later, but this seems like something that needs fixing.

+1 to handle this.
Currently, a) Publication name = pg_createsubscriber_%u, where %u is
database oid, b) Replication slot name =  pg_createsubscriber_%u_%d,
Where %u is database oid and %d is the pid and c) Subscription name =
pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid
How about we have a non mandatory option like
--prefix_object_name=mysetup1, which will create a) Publication name =
mysetup1_pg_createsubscriber_%u, and b) Replication slot name =
mysetup1_pg_createsubscriber_%u (here pid is also removed) c)
Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also
removed).

In the default case where the user does not specify
--prefix_object_name the object names will be created without any
prefix names.

Regards,
Vignesh

pgsql-hackers by date:

From: vignesh C
Date: 11 March 2024, 03:56:52
Subject: Re: speed up a logical replica setup

From: Amit Kapila
Date: 11 March 2024, 04:15:27
Subject: Re: Documentation: warn about two_phase when altering a subscription

Re: speed up a logical replica setup - Mailing list pgsql-hackers

Previous

Next