Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: speed up a logical replica setup
Date
Msg-id 8d52c226-7e34-44f7-a919-759bf8d81541@enterprisedb.com
Whole thread Raw
In response to RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: speed up a logical replica setup
List pgsql-hackers

On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas, Euler,
> 
> Thanks for starting to read the thread! Since I'm not an original author,
> I want to reply partially.
> 
>> I decided to take a quick look on this patch today, to see how it works
>> and do some simple tests. I've only started to get familiar with it, so
>> I have only some comments / questions regarding usage, not on the code.
>> It's quite possible I didn't understand some finer points, or maybe it
>> was already discussed earlier in this very long thread, so please feel
>> free to push back or point me to the past discussion.
>>
>> Also, some of this is rather opinionated, but considering I didn't see
>> this patch before, my opinions may easily be wrong ...
> 
> I felt your comments were quit valuable.
> 
>> 1) SGML docs
>>
>> It seems the SGML docs are more about explaining how this works on the
>> inside, rather than how to use the tool. Maybe that's intentional, but
>> as someone who didn't work with pg_createsubscriber before I found it
>> confusing and not very helpful.
>>
>> For example, the first half of the page is prerequisities+warning, and
>> sure those are useful details, but prerequisities are checked by the
>> tool (so I can't really miss this) and warnings go into a lot of details
>> about different places where things may go wrong. Sure, worth knowing
>> and including in the docs, but maybe not right at the beginning, before
>> I learn how to even run the tool?
> 
> Hmm, right. I considered below improvements. Tomas and Euler, how do you think?
> 
> * Adds more descriptions in "Description" section.
> * Moves prerequisities+warning to "Notes" section.
> * Adds "Usage" section which describes from a single node.
> 
>> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
>> sure it won't work for a number of use cases. I know large databases
>> it's common to create "work tables" (not necessarily temporary) as part
>> of a batch job, but there's no need to replicate those tables.
> 
> Indeed, the documentation does not describe that all tables in the database
> would be included in the publication.
> 
>> I do understand that FOR ALL TABLES is the simplest approach, and for v1
>> it may be an acceptable limitation, but maybe it'd be good to also
>> support restricting which tables should be replicated (e.g. blacklist or
>> whitelist based on table/schema name?).
> 
> May not directly related, but we considered that accepting options was a next-step [1].
> 
>> Note: I now realize this might fall under the warning about DDL, which
>> says this:
>>
>>     Executing DDL commands on the source server while running
>>     pg_createsubscriber is not recommended. If the target server has
>>     already been converted to logical replica, the DDL commands must
>>     not be replicated so an error would occur.
> 
> Yeah, they would not be replicated, but not lead ERROR.
> So should we say like "Creating tables on the source server..."?
> 

Perhaps. Clarifying the docs would help, but it depends on the wording.
For example, I doubt this should talk about "creating tables" because
there are other DDL that (probably) could cause issues (like adding a
column to the table, or something like that).

>> 5) slot / publication / subscription name
>>
>> I find it somewhat annoying it's not possible to specify names for
>> objects created by the tool - replication slots, publication and
>> subscriptions. If this is meant to be a replica running for a while,
>> after a while I'll have no idea what pg_createsubscriber_569853 or
>> pg_createsubscriber_459548_2348239 was meant for.
>>
>> This is particularly annoying because renaming these objects later is
>> either not supported at all (e.g. for replication slots), or may be
>> quite difficult (e.g. publications).
>>
>> I do realize there are challenges with custom names (say, if there are
>> multiple databases to replicate), but can't we support some simple
>> formatting with basic placeholders? So we could specify
>>
>> --slot-name "myslot_%d_%p"
>>
>> or something like that?
> 
> Not sure we can do in the first version, but looks nice. One concern is that I
> cannot find applications which accepts escape strings like log_line_prefix.
> (It may just because we do not have use-case.) Do you know examples?
> 

I can't think of a tool already doing that, but I think that's simply
because it was not needed. Why should we be concerned about this?

>> BTW what will happen if we convert multiple standbys? Can't they all get
>> the same slot name (they all have the same database OID, and I'm not
>> sure how much entropy the PID has)?
> 
> I tested and the second try did not work. The primal reason was the name of publication
> - pg_createsubscriber_%u (oid).
> FYI - previously we can reuse same publications, but based on my comment [2] the
> feature was removed. It might be too optimistic.
> 

OK. I could be convinced the other limitations are reasonable for v1 and
can be improved later, but this seems like something that needs fixing.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export
Next
From: Dean Rasheed
Date:
Subject: Re: Adding OLD/NEW support to RETURNING