Re: speed up a logical replica setup - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: speed up a logical replica setup |
Date | |
Msg-id | CALDaNm3Fg4zJ9RxnQPSW7FFqy_K605urNM_B3zAaO84Dr7_zcA@mail.gmail.com Whole thread Raw |
In response to | Re: speed up a logical replica setup (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
RE: speed up a logical replica setup
Re: speed up a logical replica setup |
List | pgsql-hackers |
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote: > > Dear Tomas, Euler, > > > > Thanks for starting to read the thread! Since I'm not an original author, > > I want to reply partially. > > > >> I decided to take a quick look on this patch today, to see how it works > >> and do some simple tests. I've only started to get familiar with it, so > >> I have only some comments / questions regarding usage, not on the code. > >> It's quite possible I didn't understand some finer points, or maybe it > >> was already discussed earlier in this very long thread, so please feel > >> free to push back or point me to the past discussion. > >> > >> Also, some of this is rather opinionated, but considering I didn't see > >> this patch before, my opinions may easily be wrong ... > > > > I felt your comments were quit valuable. > > > >> 1) SGML docs > >> > >> It seems the SGML docs are more about explaining how this works on the > >> inside, rather than how to use the tool. Maybe that's intentional, but > >> as someone who didn't work with pg_createsubscriber before I found it > >> confusing and not very helpful. > >> > >> For example, the first half of the page is prerequisities+warning, and > >> sure those are useful details, but prerequisities are checked by the > >> tool (so I can't really miss this) and warnings go into a lot of details > >> about different places where things may go wrong. Sure, worth knowing > >> and including in the docs, but maybe not right at the beginning, before > >> I learn how to even run the tool? > > > > Hmm, right. I considered below improvements. Tomas and Euler, how do you think? > > > > * Adds more descriptions in "Description" section. > > * Moves prerequisities+warning to "Notes" section. > > * Adds "Usage" section which describes from a single node. > > > >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > >> sure it won't work for a number of use cases. I know large databases > >> it's common to create "work tables" (not necessarily temporary) as part > >> of a batch job, but there's no need to replicate those tables. > > > > Indeed, the documentation does not describe that all tables in the database > > would be included in the publication. > > > >> I do understand that FOR ALL TABLES is the simplest approach, and for v1 > >> it may be an acceptable limitation, but maybe it'd be good to also > >> support restricting which tables should be replicated (e.g. blacklist or > >> whitelist based on table/schema name?). > > > > May not directly related, but we considered that accepting options was a next-step [1]. > > > >> Note: I now realize this might fall under the warning about DDL, which > >> says this: > >> > >> Executing DDL commands on the source server while running > >> pg_createsubscriber is not recommended. If the target server has > >> already been converted to logical replica, the DDL commands must > >> not be replicated so an error would occur. > > > > Yeah, they would not be replicated, but not lead ERROR. > > So should we say like "Creating tables on the source server..."? > > > > Perhaps. Clarifying the docs would help, but it depends on the wording. > For example, I doubt this should talk about "creating tables" because > there are other DDL that (probably) could cause issues (like adding a > column to the table, or something like that). > > >> 5) slot / publication / subscription name > >> > >> I find it somewhat annoying it's not possible to specify names for > >> objects created by the tool - replication slots, publication and > >> subscriptions. If this is meant to be a replica running for a while, > >> after a while I'll have no idea what pg_createsubscriber_569853 or > >> pg_createsubscriber_459548_2348239 was meant for. > >> > >> This is particularly annoying because renaming these objects later is > >> either not supported at all (e.g. for replication slots), or may be > >> quite difficult (e.g. publications). > >> > >> I do realize there are challenges with custom names (say, if there are > >> multiple databases to replicate), but can't we support some simple > >> formatting with basic placeholders? So we could specify > >> > >> --slot-name "myslot_%d_%p" > >> > >> or something like that? > > > > Not sure we can do in the first version, but looks nice. One concern is that I > > cannot find applications which accepts escape strings like log_line_prefix. > > (It may just because we do not have use-case.) Do you know examples? > > > > I can't think of a tool already doing that, but I think that's simply > because it was not needed. Why should we be concerned about this? > > >> BTW what will happen if we convert multiple standbys? Can't they all get > >> the same slot name (they all have the same database OID, and I'm not > >> sure how much entropy the PID has)? > > > > I tested and the second try did not work. The primal reason was the name of publication > > - pg_createsubscriber_%u (oid). > > FYI - previously we can reuse same publications, but based on my comment [2] the > > feature was removed. It might be too optimistic. > > > > OK. I could be convinced the other limitations are reasonable for v1 and > can be improved later, but this seems like something that needs fixing. +1 to handle this. Currently, a) Publication name = pg_createsubscriber_%u, where %u is database oid, b) Replication slot name = pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid and c) Subscription name = pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid How about we have a non mandatory option like --prefix_object_name=mysetup1, which will create a) Publication name = mysetup1_pg_createsubscriber_%u, and b) Replication slot name = mysetup1_pg_createsubscriber_%u (here pid is also removed) c) Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also removed). In the default case where the user does not specify --prefix_object_name the object names will be created without any prefix names. Regards, Vignesh
pgsql-hackers by date: