Thread: speed up a logical replica setup

speed up a logical replica setup

From
"Euler Taveira"
Date:
Hi,

Logical replication has been used to migration with minimal downtime. However,
if you are dealing with a big database, the amount of required resources (disk
-- due to WAL retention) increases as the backlog (WAL) increases. Unless you
have a generous amount of resources and can wait for long period of time until
the new replica catches up, creating a logical replica is impracticable on
large databases.

The general idea is to create and convert a physical replica or a base backup
(archived WAL files available) into a logical replica. The initial data copy
and catchup tends to be faster on a physical replica. This technique has been
successfully used in pglogical_create_subscriber [1].

A new tool called pg_subscriber does this conversion and is tightly integrated
with Postgres.

DESIGN

The conversion requires 8 steps.

1. Check if the target data directory has the same system identifier than the
source data directory.
2. Stop the target server if it is running as a standby server. (Modify
recovery parameters requires a restart.)
3. Create one replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent LSN
(This consistent LSN will be used as (a) a stopping point for the recovery
process and (b) a starting point for the subscriptions).
4. Write recovery parameters into the target data directory and start the
target server (Wait until the target server is promoted).
5. Create one publication (FOR ALL TABLES) per specified database on the source
server.
6. Create one subscription per specified database on the target server (Use
replication slot and publication created in a previous step. Don't enable the
subscriptions yet).
7. Sets the replication progress to the consistent LSN that was got in a
previous step.
8. Enable the subscription for each specified database on the target server.

This tool does not take a base backup. It can certainly be included later.
There is already a tool do it: pg_basebackup.

There is a --subscriber-conninfo option to inform the subscriber connection
string, however, we could remove it since this tool runs on the subscriber and
we can build a connection string.

NAME

I'm not sure about the proposed name. I came up with this one because it is not
so long. The last added tools uses pg_ prefix, verb (action) and object.
pg_initsubscriber and pg_createsubscriber are names that I thought but I'm not
excited about it.

DOCUMENTATION

It is available and describes this tool.

TESTS

Basic tests are included. It requires some tests to exercise this tool.


Comments?



--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Andres Freund
Date:
Hi,

On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.

Indeed.


> DESIGN
> 
> The conversion requires 8 steps.
> 
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.

I think the system identifier should also be changed, otherwise you can way
too easily get into situations trying to apply WAL from different systems to
each other. Not going to end well, obviously.


> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.

It would make sense to allow to call pg_basebackup from the new tool. Perhaps
with a --pg-basebackup-parameters or such.



Greetings,

Andres Freund



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Feb 21, 2022, at 8:28 PM, Andres Freund wrote:
I think the system identifier should also be changed, otherwise you can way
too easily get into situations trying to apply WAL from different systems to
each other. Not going to end well, obviously.
Good point.

> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.

It would make sense to allow to call pg_basebackup from the new tool. Perhaps
with a --pg-basebackup-parameters or such.
Yeah. I'm planning to do that in a near future. There are a few questions in my
mind. Should we call the pg_basebackup directly (like
pglogical_create_subscriber does) or use a base backup machinery to obtain the
backup? If we choose the former, it should probably sanitize the
--pg-basebackup-parameters to allow only a subset of the command-line options
(?). AFAICS the latter requires some refactors in the pg_basebackup code --
e.g. expose at least one function (BaseBackup?) that accepts a struct of
command-line options as a parameter and returns success/failure. Another
possibility is to implement a simple BASE_BACKUP command via replication
protocol. The disadvantages are: (a) it could duplicate code and (b) it might
require maintenance if new options are added to the BASE_BACKUP command.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Feb 21, 2022 at 5:41 PM Euler Taveira <euler@eulerto.com> wrote:
>
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.
>
> The general idea is to create and convert a physical replica or a base backup
> (archived WAL files available) into a logical replica. The initial data copy
> and catchup tends to be faster on a physical replica. This technique has been
> successfully used in pglogical_create_subscriber [1].
>

Sounds like a promising idea.

> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.
>
> DESIGN
>
> The conversion requires 8 steps.
>
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
>

What is the need to create an extra slot other than the slot for each
database? Can't we use the largest LSN returned by slots as the
recovery-target-lsn and starting point for subscriptions?

How, these additional slots will get freed or reused when say the
server has crashed/stopped after creating the slots but before
creating the subscriptions? Users won't even know the names of such
slots as they are internally created.

> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.
>
> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.
>

The backup will take the backup of all the databases present on the
source server. Do we need to provide the way/recommendation to remove
the databases that are not required?

Can we see some numbers with various sizes of databases (cluster) to
see how it impacts the time for small to large size databases as
compared to the traditional method? This might help giving users
advice on when to use this tool?


-- 
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Andreas Karlsson
Date:
On 2/21/22 13:09, Euler Taveira wrote:
> DESIGN
> 
> The conversion requires 8 steps.
> 
> 1. Check if the target data directory has the same system identifier 
> than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source 
> server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the 
> source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't 
> enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.

Very interesting!

I actually just a couple of weeks ago proposed a similar design for 
upgrading a database of a customer of mine. We have not tried it yet so 
it is not decided if we should go ahead with it.

In our case the goal is a bit different so my idea is that we will use 
pg_dump/pg_restore (or pg_upgrade and then some manual cleanup if 
pg_dump/pg_restore is too slow) on the target server. The goal of this 
design is to get a nice clean logical replica at the new version of 
PostgreSQL with indexes with the correct collations, all old invalid 
constraints validated, minimal bloat, etc. And all of this without 
creating bloat or putting too much load on the old master during the 
process. We have plenty of disk space and plenty of time so those are 
not limitations in our case. I can go into more detail if there is interest.

It is nice to see that our approach is not entirely unique. :) And I 
will take a look at this patch when I find the time.

Andreas



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 21.02.22 13:09, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly 
> integrated
> with Postgres.

Are we comfortable with the name pg_subscriber?  It seems too general.
Are we planning other subscriber-related operations in the future?  If
so, we should at least make this one use a --create option or
something like that.

  doc/src/sgml/ref/pg_subscriber.sgml

Attached is a patch that reorganizes the man page a bit.  I moved the
description of the steps to the Notes section and formatted it
differently.  I think the steps are interesting but not essential for
the using of the program, so I wanted to get them out of the main
description.

  src/bin/pg_subscriber/pg_subscriber.c

+   if (made_temp_replslot)
+   {
+       conn = connect_database(dbinfo[0].pubconninfo, true);
+       drop_replication_slot(conn, &dbinfo[0], temp_replslot);
+       disconnect_database(conn);
+   }

Temp slots don't need to be cleaned up.

+/*
+ * Obtain the system identifier from control file. It will be used to 
compare
+ * if a data directory is a clone of another one. This routine is used 
locally
+ * and avoids a replication connection.
+ */
+static char *
+get_control_from_datadir(const char *datadir)

This could return uint64 directly, without string conversion.
get_sysid_from_conn() could then convert to uint64 internally.

+       {"verbose", no_argument, NULL, 'v'},

I'm not sure if the --verbose option is all that useful.

+       {"stop-subscriber", no_argument, NULL, 1},

This option doesn't seem to be otherwise supported or documented.

+   pub_sysid = pg_malloc(32);
+   pub_sysid = get_sysid_from_conn(dbinfo[0].pubconninfo);
+   sub_sysid = pg_malloc(32);
+   sub_sysid = get_control_from_datadir(subscriber_dir);

These mallocs don't appears to be of any use.

+   dbname_conninfo = pg_malloc(NAMEDATALEN);

This seems wrong.

Overall, this code could use a little bit more structuring.  There are
a lot of helper functions that don't seem to do a lot and are mostly
duplicate runs-this-SQL-command calls.  But the main() function is
still huge.  There is room for refinement.

  src/bin/pg_subscriber/t/001_basic.pl

Good start, but obviously, we'll need some real test cases here also.

  src/bin/initdb/initdb.c
  src/bin/pg_ctl/pg_ctl.c
  src/common/file_utils.c
  src/include/common/file_utils.h

I recommend skipping this refactoring.  The readfile() function from
pg_ctl is not general enough to warrant the pride of place of a
globally available function.  Note that it is specifically geared
toward some of pg_ctl's requirements, for example that the underlying
file can change while it is being read.

The requirements of pg_subscriber can be satisfied more easily: Just
call pg_ctl to start the server.  You are already using that in
pg_subscriber.  Is there a reason it can't be used here as well?
Attachment

Re: speed up a logical replica setup

From
Andrew Dunstan
Date:
On 3/15/22 09:51, Peter Eisentraut wrote:
> On 21.02.22 13:09, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly
>> integrated
>> with Postgres.
>
> Are we comfortable with the name pg_subscriber?  It seems too general.
> Are we planning other subscriber-related operations in the future?  If
> so, we should at least make this one use a --create option or
> something like that.


Not really sold on the name (and I didn't much like the name
pglogical_create_subscriber either, although it's a cool facility and
I'm happy to see us adopting something like it).

ISTM we should have a name that conveys that we are *converting* a
replica or equivalent to a subscriber.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: speed up a logical replica setup

From
Andres Freund
Date:
Hi,

On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.

Given that this has been submitted just before the last CF and is a patch of
nontrivial size, has't made significant progress ISTM it should be moved to
the next CF?

It currently fails in cfbot, but that's likely just due to Peter's incremental
patch. Perhaps you could make sure the patch still applies and repost?

Greetings,

Andres Freund



Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:


On Fri, 18 Mar 2022 at 19:34 Andrew Dunstan <andrew@dunslane.net> wrote:

On 3/15/22 09:51, Peter Eisentraut wrote:
> On 21.02.22 13:09, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly
>> integrated
>> with Postgres.
>
> Are we comfortable with the name pg_subscriber?  It seems too general.
> Are we planning other subscriber-related operations in the future?  If
> so, we should at least make this one use a --create option or
> something like that.


Not really sold on the name (and I didn't much like the name
pglogical_create_subscriber either, although it's a cool facility and
I'm happy to see us adopting something like it).

ISTM we should have a name that conveys that we are *converting* a
replica or equivalent to a subscriber.


Some time ago I did a POC on it [1] and I used the name pg_create_subscriber

--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 18.03.22 23:34, Andrew Dunstan wrote:
> On 3/15/22 09:51, Peter Eisentraut wrote:
>> On 21.02.22 13:09, Euler Taveira wrote:
>>> A new tool called pg_subscriber does this conversion and is tightly
>>> integrated
>>> with Postgres.
>>
>> Are we comfortable with the name pg_subscriber?  It seems too general.
>> Are we planning other subscriber-related operations in the future?  If
>> so, we should at least make this one use a --create option or
>> something like that.
> 
> 
> Not really sold on the name (and I didn't much like the name
> pglogical_create_subscriber either, although it's a cool facility and
> I'm happy to see us adopting something like it).
> 
> ISTM we should have a name that conveys that we are *converting* a
> replica or equivalent to a subscriber.

The pglogical tool includes the pg_basebackup run, so it actually 
"creates" the subscriber from scratch.  Whether this tool is also doing 
that is still being discussed.



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 22.03.22 02:25, Andres Freund wrote:
> On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly integrated
>> with Postgres.
> 
> Given that this has been submitted just before the last CF and is a patch of
> nontrivial size, has't made significant progress ISTM it should be moved to
> the next CF?

done



Re: speed up a logical replica setup

From
Jacob Champion
Date:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3556/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Feb 21, 2022, at 9:09 AM, Euler Taveira wrote:
A new tool called pg_subscriber does this conversion and is tightly integrated
with Postgres.

After a long period of inactivity, I'm back to this client tool. As suggested
by Andres, I added a new helper function to change the system identifier as the
last step. I also thought about including the pg_basebackup support but decided
to keep it simple (at least for this current version). The user can always
execute pg_basebackup as a preliminary step to create a standby replica and it
will work. (I will post a separate patch that includes the pg_basebackup
support on the top of this one.)

Amit asked if an extra replication slot is required. It is not. The reason I
keep it is to remember that at least the latest replication slot needs to be
created after the pg_basebackup finishes (pg_backup_stop() call). Regarding the
atexit() routine, it tries to do the best to remove all the objects it created,
however, there is no guarantee it can remove them because it depends on
external resources such as connectivity and authorization. I added a new
warning message if it cannot drop the transient replication slot. It is
probably a good idea to add such warning message into the cleanup routine too.
More to this point, another feature that checks and remove all left objects.
The transient replication slot is ok because it should always be removed at the
end. However, the big question is how to detect that you are not removing
objects (publications, subscriptions, replication slots) from a successful
conversion.

Amit also asked about setup a logical replica with m databases where m is less
than the total number of databases. One option is to remove the "extra"
databases in the target server after promoting the physical replica or in one
of the latest steps. Maybe it is time to propose partial physical replica that
contains only a subset of databases on primary. (I'm not volunteering to it.)
Hence, pg_basebackup has an option to remove these "extra" databases so this
tool can take advantage of it.

Let's continue with the bike shedding... I agree with Peter E that this name
does not express what this tool is. At the moment, it only have one action:
create. If I have to suggest other actions I would say that it could support
switchover option too (that removes the infrastructure created by this tool).
If we decide to keep this name, it should be a good idea to add an option to
indicate what action it is executing (similar to pg_recvlogical) as suggested
by Peter.

I included the documentation cleanups that Peter E shared. I also did small
adjustments into the documentation. It probably deserves a warning section that
advertises about the cleanup.

I refactored the transient replication slot code and decided to use a permanent
(instead of temporary) slot to avoid keeping a replication connection open for
a long time until the target server catches up.

The system identifier functions (get_control_from_datadir() and
get_sysid_from_conn()) now returns uint64 as suggested by Peter.

After reflection, the --verbose option should be renamed to --progress. There
are also some messages that should be converted to debug messages.

I fixed the useless malloc. I rearrange the code a bit but the main still has ~
370 lines (without options/validation ~ 255 lines. I'm trying to rearrange the
code to make the code easier to read and at the same time reduce the main size.
I already have a few candidates in mind such as the code that stops the standby
and the part that includes the recovery parameters. I removed the refactor I
proposed in the previous patch and the current code is relying on pg_ctl --wait
behavior. Are there issues with this choice? Well, one annoying situation is
that pg_ctl does not have a "wait forever" option. If one of the pg_ctl calls
fails, you could probably have to start again (unless you understand the
pg_subscriber internals and fix the setup by yourself). You have to choose an
arbitrary timeout value and expect that pg_ctl *does* perform the action less
than the timeout.

Real tests are included. The cleanup code does not have coverage because a
simple reproducible case isn't easy. I'm also not sure if it is worth it. We
can explain it in the warning section that was proposed.

It is still a WIP but I would like to share it and get some feedback.


--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Ashutosh Bapat
Date:
On Mon, Oct 23, 2023 at 9:34 AM Euler Taveira <euler@eulerto.com> wrote:

>
> It is still a WIP but I would like to share it and get some feedback.
>
>

I have started reviewing the patch. I have just read through all the
code. It's well documented and clear. Next I will review the design in
detail. Here are a couple of minor comments
1.
+tests += {
+ 'name': 'pg_subscriber',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'tests': [
+ 't/001_basic.pl',

COMMENT
Shouldn't we include 002_standby.pl?

2. CreateReplicationSlotLSN, is not used anywhere. Instead I see
create_logical_replication_slot() in pg_subscriber.c. Which of these
two you intend to use finally?

--
Best Wishes,
Ashutosh Bapat



Re: speed up a logical replica setup

From
shihao zhong
Date:
I think this is duplicate with https://commitfest.postgresql.org/45/4637/

The new status of this patch is: Waiting on Author

Re: speed up a logical replica setup

From
Ashutosh Bapat
Date:
On Thu, Oct 26, 2023 at 5:17 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Oct 23, 2023 at 9:34 AM Euler Taveira <euler@eulerto.com> wrote:
>
> >
> > It is still a WIP but I would like to share it and get some feedback.
> >
> >
>
> I have started reviewing the patch. I have just read through all the
> code. It's well documented and clear. Next I will review the design in
> detail.

Here are some comments about functionality and design.

+ <step>
+ <para>
+ <application>pg_subscriber</application> creates one replication slot for
+ each specified database on the source server. The replication slot name
+ contains a <literal>pg_subscriber</literal> prefix. These replication
+ slots will be used by the subscriptions in a future step. Another
+ replication slot is used to get a consistent start location. This
+ consistent LSN will be used as a stopping point in the <xref
+ linkend="guc-recovery-target-lsn"/> parameter and by the
+ subscriptions as a replication starting point. It guarantees that no
+ transaction will be lost.
+ </para>
+ </step>

CREATE_REPLICATION_SLOT would wait for any incomplete transaction to
complete. So it may not be possible to have an incomplete transaction
on standby when it comes out of recovery. Am I correct? Can we please
have a testcase where we test this scenario? What about a prepared
transactions?

+
+ <step>
+ <para>
+ <application>pg_subscriber</application> writes recovery parameters into
+ the target data directory and start the target server. It specifies a LSN
+ (consistent LSN that was obtained in the previous step) of write-ahead
+ log location up to which recovery will proceed. It also specifies
+ <literal>promote</literal> as the action that the server should take once
+ the recovery target is reached. This step finishes once the server ends
+ standby mode and is accepting read-write operations.
+ </para>
+ </step>

At this stage the standby would have various replication objects like
publications, subscriptions, origins inherited from the upstream
server and possibly very much active. With failover slots, it might
inherit replication slots. Is it intended that the new subscriber also
acts as publisher for source's subscribers OR that the new subscriber
should subscribe to the upstreams of the source? Some use cases like
logical standby might require that but a multi-master multi-node setup
may not. The behaviour should be user configurable.

There may be other objects in this category which need special consideration on
the subscriber. I haven't fully thought through the list of such objects.

+ uses the replication slot that was created in a previous step. The
+ subscription is created but it is not enabled yet. The reason is the
+ replication progress must be set to the consistent LSN but replication
+ origin name contains the subscription oid in its name. Hence, the

Not able to understand the sentence "The reason is ... in its name".
Why is subscription OID in origin name matters?

+ <para>
+ <application>pg_subscriber</application> stops the target server to change
+ its system identifier.
+ </para>

I expected the subscriber to be started after this step.

Why do we need pg_resetwal?

+ appendPQExpBuffer(str, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
+ appendPQExpBufferStr(str, " LOGICAL \"pgoutput\" NOEXPORT_SNAPSHOT");

Hardcoding output plugin name would limit this utility only to
built-in plugin. Any reason for that limitation?

In its current form the utility creates a logical subscriber which
subscribes to all the tables (and sequences when we have sequence
replication). But it will be useful even in case of selective
replication from a very large database. In such a case the new
subscriber will need to a. remove the unwanted objects b. subscriber
will need to subscribe to publications publishing the "interesting"
objects. We don't need to support this case, but the current
functionality (including the interface) and design shouldn't limit us
from doing so. Have you thought about this case?

I noticed some differences between this and a similar utility
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c.
I will be reviewing these differences next to see if we are missing
anything here.

--
Best Wishes,
Ashutosh Bapat



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Oct 31, 2023, at 11:46 PM, shihao zhong wrote:
I think this is duplicate with https://commitfest.postgresql.org/45/4637/

The new status of this patch is: Waiting on Author


I withdrew the other entry.


--
Euler Taveira

Re: speed up a logical replica setup

From
Ashutosh Bapat
Date:
On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> I noticed some differences between this and a similar utility
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c.
> I will be reviewing these differences next to see if we are missing
> anything here.

Some more missing things to discuss

Handling signals - The utility cleans up left over objects on exit.
But default signal handlers will make the utility exit without a
proper cleanup [1]. The signal handlers may clean up the objects
themselves or at least report the objects that need tobe cleaned up.

Idempotent behaviour - Given that the utility will be used when very
large amount of data is involved, redoing everything after a network
glitch or a temporary failure should be avoided. This is true when the
users start with base backup. Again, I don't think we should try to be
idempotent in v1 but current design shouldn't stop us from doing so. I
didn't find anything like that in my review. But something to keep in
mind.

That finishes my first round of review. I will wait for your updated
patches before the next round.

[1] NOTEs section in man atexit().


--
Best Wishes,
Ashutosh Bapat



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 23.10.23 05:53, Euler Taveira wrote:
> Let's continue with the bike shedding... I agree with Peter E that this name
> does not express what this tool is. At the moment, it only have one action:
> create. If I have to suggest other actions I would say that it could support
> switchover option too (that removes the infrastructure created by this 
> tool).
> If we decide to keep this name, it should be a good idea to add an option to
> indicate what action it is executing (similar to pg_recvlogical) as 
> suggested
> by Peter.

Speaking of which, would it make sense to put this tool (whatever the 
name) into the pg_basebackup directory?  It's sort of related, and it 
also shares some code.




Re: speed up a logical replica setup

From
Michael Paquier
Date:
On Tue, Nov 07, 2023 at 10:00:39PM +0100, Peter Eisentraut wrote:
> Speaking of which, would it make sense to put this tool (whatever the name)
> into the pg_basebackup directory?  It's sort of related, and it also shares
> some code.

I've read the patch, and the additions to streamutil.h and
streamutil.c make it kind of natural to have it sit in pg_basebackup/.
There's pg_recvlogical already there.  I am wondering about two
things, though:
- Should the subdirectory pg_basebackup be renamed into something more
generic at this point?  All these things are frontend tools that deal
in some way with the replication protocol to do their work.  Say
a replication_tools?
- And if it would be better to refactor some of the code generic to
all these streaming tools to fe_utils.  What makes streamutil.h a bit
less pluggable are all its extern variables to control the connection,
but perhaps that can be an advantage, as well, in some cases.
--
Michael

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Nov 7, 2023, at 8:12 PM, Michael Paquier wrote:
On Tue, Nov 07, 2023 at 10:00:39PM +0100, Peter Eisentraut wrote:
> Speaking of which, would it make sense to put this tool (whatever the name)
> into the pg_basebackup directory?  It's sort of related, and it also shares
> some code.

I used the CreateReplicationSlot() from streamutil.h but decided to use the
CREATE_REPLICATION_SLOT command directly because it needs the LSN as output. As
you noticed at that time I wouldn't like a dependency in the pg_basebackup
header files; if we move this binary to base backup directory, it seems natural
to refactor the referred function and use it.

I've read the patch, and the additions to streamutil.h and
streamutil.c make it kind of natural to have it sit in pg_basebackup/.
There's pg_recvlogical already there.  I am wondering about two
things, though:
- Should the subdirectory pg_basebackup be renamed into something more
generic at this point?  All these things are frontend tools that deal
in some way with the replication protocol to do their work.  Say
a replication_tools?

It is a good fit for this tool since it is another replication tool. I also
agree with the directory renaming; it seems confusing that the directory has
the same name as one binary but also contains other related binaries in it.

- And if it would be better to refactor some of the code generic to
all these streaming tools to fe_utils.  What makes streamutil.h a bit
less pluggable are all its extern variables to control the connection,
but perhaps that can be an advantage, as well, in some cases.

I like it. There are common functions such as GetConnection(),
CreateReplicationSlot(), DropReplicationSlot() and RunIdentifySystem() that is
used by all of these replication tools. We can move the extern variables into
parameters to have a pluggable streamutil.h.


--
Euler Taveira

Re: speed up a logical replica setup

From
Michael Paquier
Date:
On Wed, Nov 08, 2023 at 09:50:47AM -0300, Euler Taveira wrote:
> On Tue, Nov 7, 2023, at 8:12 PM, Michael Paquier wrote:
> I used the CreateReplicationSlot() from streamutil.h but decided to use the
> CREATE_REPLICATION_SLOT command directly because it needs the LSN as output. As
> you noticed at that time I wouldn't like a dependency in the pg_basebackup
> header files; if we move this binary to base backup directory, it seems natural
> to refactor the referred function and use it.

Right.  That should be OK to store that in an optional XLogRecPtr
pointer, aka by letting the option to pass NULL as argument of the
function if the caller needs nothing.

>> I've read the patch, and the additions to streamutil.h and
>> streamutil.c make it kind of natural to have it sit in pg_basebackup/.
>> There's pg_recvlogical already there.  I am wondering about two
>> things, though:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point?  All these things are frontend tools that deal
>> in some way with the replication protocol to do their work.  Say
>> a replication_tools?
>
> It is a good fit for this tool since it is another replication tool. I also
> agree with the directory renaming; it seems confusing that the directory has
> the same name as one binary but also contains other related binaries in it.

Or cluster_tools?  Or stream_tools?  replication_tools may be OK, but
I have a bad sense in naming new things around here.  So if anybody
has a better idea, feel free..

>> - And if it would be better to refactor some of the code generic to
>> all these streaming tools to fe_utils.  What makes streamutil.h a bit
>> less pluggable are all its extern variables to control the connection,
>> but perhaps that can be an advantage, as well, in some cases.
>
> I like it. There are common functions such as GetConnection(),
> CreateReplicationSlot(), DropReplicationSlot() and RunIdentifySystem() that is
> used by all of these replication tools. We can move the extern variables into
> parameters to have a pluggable streamutil.h.

And perhaps RetrieveWalSegSize() as well as GetSlotInformation().
These kick replication commands.
--
Michael

Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 08.11.23 00:12, Michael Paquier wrote:
> - Should the subdirectory pg_basebackup be renamed into something more
> generic at this point?  All these things are frontend tools that deal
> in some way with the replication protocol to do their work.  Say
> a replication_tools?

Seems like unnecessary churn.  Nobody has complained about any of the 
other tools in there.

> - And if it would be better to refactor some of the code generic to
> all these streaming tools to fe_utils.  What makes streamutil.h a bit
> less pluggable are all its extern variables to control the connection,
> but perhaps that can be an advantage, as well, in some cases.

Does anyone outside of pg_basebackup + existing friends + new friend 
need that?  Seems like extra complications.




Re: speed up a logical replica setup

From
Michael Paquier
Date:
On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> On 08.11.23 00:12, Michael Paquier wrote:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point?  All these things are frontend tools that deal
>> in some way with the replication protocol to do their work.  Say
>> a replication_tools?
>
> Seems like unnecessary churn.  Nobody has complained about any of the other
> tools in there.

Not sure.  We rename things across releases in the tree from time to
time, and here that's straight-forward.

>> - And if it would be better to refactor some of the code generic to
>> all these streaming tools to fe_utils.  What makes streamutil.h a bit
>> less pluggable are all its extern variables to control the connection,
>> but perhaps that can be an advantage, as well, in some cases.
>
> Does anyone outside of pg_basebackup + existing friends + new friend need
> that?  Seems like extra complications.

Actually, yes, I've used these utility routines in some past work, and
having the wrapper routines able to run the replication commands in
fe_utils would have been nicer than having to link to a source tree.
--
Michael

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> On 08.11.23 00:12, Michael Paquier wrote:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point?  All these things are frontend tools that deal
>> in some way with the replication protocol to do their work.  Say
>> a replication_tools?

> Seems like unnecessary churn.  Nobody has complained about any of the other
> tools in there.

Not sure.  We rename things across releases in the tree from time to
time, and here that's straight-forward.

Based on this discussion it seems we have a consensus that this tool should be
in the pg_basebackup directory. (If/when we agree with the directory renaming,
it could be done in a separate patch.) Besides this move, the v3 provides a dry
run mode. It basically executes every routine but skip when should do
modifications. It is an useful option to check if you will be able to run it
without having issues with connectivity, permission, and existing objects
(replication slots, publications, subscriptions). Tests were slightly improved.
Messages were changed to *not* provide INFO messages by default and --verbose
provides INFO messages and --verbose --verbose also provides DEBUG messages. I
also refactored the connect_database() function into which the connection will
always use the logical replication mode. A bug was fixed in the transient
replication slot name. Ashutosh review [1] was included. The code was also indented.

There are a few suggestions from Ashutosh [2] that I will reply in another
email.

I'm still planning to work on the following points:

1. improve the cleanup routine to point out leftover objects if there is any
   connection issue.
2. remove the physical replication slot if the standby is using one
   (primary_slot_name).
3. provide instructions to promote the logical replica into primary, I mean,
   stop the replication between the nodes and remove the replication setup
   (publications, subscriptions, replication slots). Or even include another
   action to do it. We could add both too.

Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
UI for users that would like to use it.




--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>
>
> [1] https://postgr.es/m/CAExHW5sCAU3NvPKd7msScQKvrBN-x_AdDQD-ZYAwOxuWG%3Doz1w%40mail.gmail.com
> [2] https://postgr.es/m/CAExHW5vHFemFvTUHe+7XWphVZJxrEXz5H3dD4UQi7CwmdMJQYg@mail.gmail.com
>

The changes in the file 'src/tools/msvc/Mkvcbuild.pm' seems
unnecessary as the folder 'msvc' is removed due to the commit [1].


To review the changes, I did 'git reset --hard' to the commit previous
to commit [1].
I tried to build the postgres on my Windows machine using two methods:
i. building using Visual Studio
ii. building using Meson

When I built the code using Visual Studio, on installing postgres,
pg_subscriber binary was not created.
But when I built the code using Meson, on installing postgres,
pg_subscriber binary was created.
Is this behaviour intentional?

[1] https://github.com/postgres/postgres/commit/1301c80b2167feb658a738fa4ceb1c23d0991e23

Thanks and Regards,
Shlok Kyal



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Dec 20, 2023, at 9:22 AM, Shlok Kyal wrote:
When I built the code using Visual Studio, on installing postgres,
pg_subscriber binary was not created.
But when I built the code using Meson, on installing postgres,
pg_subscriber binary was created.
Is this behaviour intentional?

No. I will update the patch accordingly. I suspect that a fair amount of patches
broke due to MSVC change.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
>

I think this is an important part. Shall we try to write to some file
the pending objects to be cleaned up? We do something like that during
the upgrade.

> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>

Isn't point 2 also essential because how would otherwise such a slot
be advanced or removed?

A few other points:
==============
1. Previously, I asked whether we need an additional replication slot
patch created to get consistent LSN and I see the following comment in
the patch:

+ *
+ * XXX we should probably use the last created replication slot to get a
+ * consistent LSN but it should be changed after adding pg_basebackup
+ * support.

Yeah, sure, we may want to do that after backup support and we can
keep a comment for the same but I feel as the patch stands today,
there is no good reason to keep it. Also, is there a reason that we
can't create the slots after backup is complete and before we write
recovery parameters

2.
+ appendPQExpBuffer(str,
+   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
+   "WITH (create_slot = false, copy_data = false, enabled = false)",
+   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);

Shouldn't we enable two_phase by default for newly created
subscriptions? Is there a reason for not doing so?

3. How about sync slots on the physical standby if present? Do we want
to retain those as it is or do we need to remove those? We are
actively working on the patch [1] for the same.

4. Can we see some numbers with various sizes of databases (cluster)
to see how it impacts the time for small to large-size databases as
compared to the traditional method? This might help us with giving
users advice on when to use this tool. We can do this bit later as
well when the patch is closer to being ready for commit.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716DAF72265388A2AD424119495A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Here are some comments about functionality and design.
>
> + <step>
> + <para>
> + <application>pg_subscriber</application> creates one replication slot for
> + each specified database on the source server. The replication slot name
> + contains a <literal>pg_subscriber</literal> prefix. These replication
> + slots will be used by the subscriptions in a future step. Another
> + replication slot is used to get a consistent start location. This
> + consistent LSN will be used as a stopping point in the <xref
> + linkend="guc-recovery-target-lsn"/> parameter and by the
> + subscriptions as a replication starting point. It guarantees that no
> + transaction will be lost.
> + </para>
> + </step>
>
> CREATE_REPLICATION_SLOT would wait for any incomplete transaction to
> complete. So it may not be possible to have an incomplete transaction
> on standby when it comes out of recovery. Am I correct? Can we please
> have a testcase where we test this scenario? What about a prepared
> transactions?
>

It will wait even for prepared transactions to commit. So, there
shouldn't be any behavior difference for prepared and non-prepared
transactions.

> +
> + <step>
> + <para>
> + <application>pg_subscriber</application> writes recovery parameters into
> + the target data directory and start the target server. It specifies a LSN
> + (consistent LSN that was obtained in the previous step) of write-ahead
> + log location up to which recovery will proceed. It also specifies
> + <literal>promote</literal> as the action that the server should take once
> + the recovery target is reached. This step finishes once the server ends
> + standby mode and is accepting read-write operations.
> + </para>
> + </step>
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.
>

Good points but even if we make it user configurable how to exclude
such replication objects? And if we don't exclude then what will be
their use because if one wants to use it as a logical standby then we
only need publications and failover/sync slots in it and also there
won't be a need to create new slots, publications on the primary to
make the current physical standby as logical subscriber.

> There may be other objects in this category which need special consideration on
> the subscriber. I haven't fully thought through the list of such objects.
>
> + uses the replication slot that was created in a previous step. The
> + subscription is created but it is not enabled yet. The reason is the
> + replication progress must be set to the consistent LSN but replication
> + origin name contains the subscription oid in its name. Hence, the
>
> Not able to understand the sentence "The reason is ... in its name".
> Why is subscription OID in origin name matters?
>

Using subscription OID in origin is probably to uniquely identify the
origin corresponding to the subscription, we do that while creating a
subscription as well.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
vignesh C
Date:
On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.

1) This Assert can fail if source is shutdown:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);

I could simulate it by shutting the primary while trying to reach the
consistent state:
pg_subscriber: postmaster reached the consistent state
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: pg_subscriber.c:692: drop_replication_slot: Assertion
`conn != ((void *)0)' failed.
Aborted

2) Should we have some checks to see if the max replication slot
configuration is ok based on the number of slots that will be created,
we have similar checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

3) Should we check if wal_level is set to logical, we have similar
checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

4) The physical replication slot that was created will still be
present in the primary node, I felt this should be removed.

5) I felt the target server should be started before completion of
pg_subscriber:
+       /*
+        * Stop the subscriber.
+        */
+       pg_log_info("stopping the subscriber");
+
+       pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
subscriber_dir);
+       rc = system(pg_ctl_cmd);
+       pg_ctl_status(pg_ctl_cmd, rc, 0);
+
+       /*
+        * Change system identifier.
+        */
+       modify_sysid(pg_resetwal_path, subscriber_dir);
+
+       success = true;
+
+       pg_log_info("Done!");
+
+       return 0;

Regards,
Vignesh



Re: speed up a logical replica setup

From
vignesh C
Date:
On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.

How about we do like this:
a) Starting the server in binary upgrade mode(so that the existing
subscriptions will not try to connect to the publishers) b) Disable
the subscriptions c) Drop the replication slots d) Drop the
publications e) Then restart the server in normal(non-upgrade) mode.
f) The rest of pg_subscriber work like
create_all_logical_replication_slots, create_subscription,
set_replication_progress, enable_subscription, etc
This will be done by default. There will be an option
--clean-logical-replication-info provided to allow DBA not to clean
the objects if DBA does not want to remove these objects.
I felt cleaning the replication information is better as a) Node-1
will replicate all the data to Node-2 (that Node-1 is subscribing to
from other nodes) after pg_subscriber setup is done. b) all the data
that Node-1 is publishing need not be published again by Node-2. There
is an option to override if the user does not want to remove the
logical replication objects.

Regards,
Vignesh



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Wed, Jan 3, 2024 at 12:09 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > At this stage the standby would have various replication objects like
> > publications, subscriptions, origins inherited from the upstream
> > server and possibly very much active. With failover slots, it might
> > inherit replication slots. Is it intended that the new subscriber also
> > acts as publisher for source's subscribers OR that the new subscriber
> > should subscribe to the upstreams of the source? Some use cases like
> > logical standby might require that but a multi-master multi-node setup
> > may not. The behaviour should be user configurable.
>
> How about we do like this:
> a) Starting the server in binary upgrade mode(so that the existing
> subscriptions will not try to connect to the publishers)
>

Can't we simply do it by starting the server with
max_logical_replication_workers = 0 or is there some other need to
start in binary upgrade mode?

 b) Disable
> the subscriptions
>

Why not simply drop the subscriptions?

 c) Drop the replication slots d) Drop the
> publications
>

I am not so sure about dropping publications because, unlike
subscriptions which can start to pull the data, there is no harm with
publications. Similar to publications there could be some user-defined
functions or other other objects which may not be required once the
standby replica is converted to subscriber. I guess we need to leave
those to the user.

 e) Then restart the server in normal(non-upgrade) mode.
> f) The rest of pg_subscriber work like
> create_all_logical_replication_slots, create_subscription,
> set_replication_progress, enable_subscription, etc
> This will be done by default. There will be an option
> --clean-logical-replication-info provided to allow DBA not to clean
> the objects if DBA does not want to remove these objects.
>

I agree that some kind of switch to control this action would be useful.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
vignesh C
Date:
On Wed, 3 Jan 2024 at 14:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 12:09 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > At this stage the standby would have various replication objects like
> > > publications, subscriptions, origins inherited from the upstream
> > > server and possibly very much active. With failover slots, it might
> > > inherit replication slots. Is it intended that the new subscriber also
> > > acts as publisher for source's subscribers OR that the new subscriber
> > > should subscribe to the upstreams of the source? Some use cases like
> > > logical standby might require that but a multi-master multi-node setup
> > > may not. The behaviour should be user configurable.
> >
> > How about we do like this:
> > a) Starting the server in binary upgrade mode(so that the existing
> > subscriptions will not try to connect to the publishers)
> >
>
> Can't we simply do it by starting the server with
> max_logical_replication_workers = 0 or is there some other need to
> start in binary upgrade mode?

I agree, max_logical_replication_workers = 0 is enough for our case.

>  b) Disable
> > the subscriptions
> >
>
> Why not simply drop the subscriptions?

Dropping subscriptions is ok as these subscriptions will not be required.

>  c) Drop the replication slots d) Drop the
> > publications
> >
>
> I am not so sure about dropping publications because, unlike
> subscriptions which can start to pull the data, there is no harm with
> publications. Similar to publications there could be some user-defined
> functions or other other objects which may not be required once the
> standby replica is converted to subscriber. I guess we need to leave
> those to the user.

Yes, that makes sense.

Regards,
Vignesh



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Dec 21, 2023, at 3:16 AM, Amit Kapila wrote:
I think this is an important part. Shall we try to write to some file
the pending objects to be cleaned up? We do something like that during
the upgrade.

That's a good idea.

> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>

Isn't point 2 also essential because how would otherwise such a slot
be advanced or removed?

I'm worried about a scenario that you will still use the primary. (Let's say
the logical replica will be promoted to a staging or dev server.) No connection
between primary and this new server so the primary slot is useless after the
promotion.

A few other points:
==============
1. Previously, I asked whether we need an additional replication slot
patch created to get consistent LSN and I see the following comment in
the patch:

+ *
+ * XXX we should probably use the last created replication slot to get a
+ * consistent LSN but it should be changed after adding pg_basebackup
+ * support.

Yeah, sure, we may want to do that after backup support and we can
keep a comment for the same but I feel as the patch stands today,
there is no good reason to keep it.

I'll remove the comment to avoid confusing.

Also, is there a reason that we
can't create the slots after backup is complete and before we write
recovery parameters

No.

2.
+ appendPQExpBuffer(str,
+   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
+   "WITH (create_slot = false, copy_data = false, enabled = false)",
+   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);

Shouldn't we enable two_phase by default for newly created
subscriptions? Is there a reason for not doing so?

Why? I decided to keep the default for some settings (streaming,
synchronous_commit, two_phase, disable_on_error). Unless there is a compelling
reason to enable it, I think we should use the default. Either way, data will
arrive on subscriber as soon as the prepared transaction is committed.

3. How about sync slots on the physical standby if present? Do we want
to retain those as it is or do we need to remove those? We are
actively working on the patch [1] for the same.

I didn't read the current version of the referred patch but if the proposal is
to synchronize logical replication slots iif you are using a physical
replication, as soon as pg_subscriber finishes the execution, there won't be
synchronization on these logical replication slots because there isn't a
physical replication anymore. If the goal is a promotion, the current behavior
is correct because the logical replica will retain WAL since it was converted.
However, if you are creating a logical replica, this WAL retention is not good
and the customer should eventually remove these logical replication slots on
the logical replica.

4. Can we see some numbers with various sizes of databases (cluster)
to see how it impacts the time for small to large-size databases as
compared to the traditional method? This might help us with giving
users advice on when to use this tool. We can do this bit later as
well when the patch is closer to being ready for commit.

I'll share it.


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Jan 1, 2024, at 7:14 AM, vignesh C wrote:
1) This Assert can fail if source is shutdown:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);

Oops. I'll remove it.

2) Should we have some checks to see if the max replication slot
configuration is ok based on the number of slots that will be created,
we have similar checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

That's a good idea.

3) Should we check if wal_level is set to logical, we have similar
checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

That's a good idea.

4) The physical replication slot that was created will still be
present in the primary node, I felt this should be removed.

My proposal is to remove it [1]. It'll be include in the next version.

5) I felt the target server should be started before completion of
pg_subscriber:

Why? The initial version had an option to stop the subscriber. I decided to
remove the option and stop the subscriber by default mainly because (1) it is
an extra step to start the server (another point is that the WAL retention
doesn't happen due to additional (synchronized?) replication slots on
subscriber -- point 2). It was a conservative choice. If point 2 isn't an
issue, imo point 1 is no big deal.




--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 8:24 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Dec 21, 2023, at 3:16 AM, Amit Kapila wrote:
>
> > 2. remove the physical replication slot if the standby is using one
> >    (primary_slot_name).
> > 3. provide instructions to promote the logical replica into primary, I mean,
> >    stop the replication between the nodes and remove the replication setup
> >    (publications, subscriptions, replication slots). Or even include another
> >    action to do it. We could add both too.
> >
> > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > UI for users that would like to use it.
> >
>
> Isn't point 2 also essential because how would otherwise such a slot
> be advanced or removed?
>
>
> I'm worried about a scenario that you will still use the primary. (Let's say
> the logical replica will be promoted to a staging or dev server.) No connection
> between primary and this new server so the primary slot is useless after the
> promotion.
>

So, you also seem to be saying that it is not required once
pg_subscriber has promoted it. So, why it should be optional to remove
physical_replication_slot? I think we must remove it from the primary
unless there is some other reason.

> A few other points:
> ==============
> 1. Previously, I asked whether we need an additional replication slot
> patch created to get consistent LSN and I see the following comment in
> the patch:
>
> + *
> + * XXX we should probably use the last created replication slot to get a
> + * consistent LSN but it should be changed after adding pg_basebackup
> + * support.
>
> Yeah, sure, we may want to do that after backup support and we can
> keep a comment for the same but I feel as the patch stands today,
> there is no good reason to keep it.
>
>
> I'll remove the comment to avoid confusing.
>

My point is to not have an additional slot and keep a comment
indicating that we need an extra slot once we add pg_basebackup
support.

>
> 2.
> + appendPQExpBuffer(str,
> +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
>
> Shouldn't we enable two_phase by default for newly created
> subscriptions? Is there a reason for not doing so?
>
>
> Why? I decided to keep the default for some settings (streaming,
> synchronous_commit, two_phase, disable_on_error). Unless there is a compelling
> reason to enable it, I think we should use the default. Either way, data will
> arrive on subscriber as soon as the prepared transaction is committed.
>

I thought we could provide a better experience for logical replicas
created by default but I see your point and probably keeping default
values for parameters you mentioned seems reasonable to me.

> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
>
> I didn't read the current version of the referred patch but if the proposal is
> to synchronize logical replication slots iif you are using a physical
> replication, as soon as pg_subscriber finishes the execution, there won't be
> synchronization on these logical replication slots because there isn't a
> physical replication anymore. If the goal is a promotion, the current behavior
> is correct because the logical replica will retain WAL since it was converted.
>

I don't understand what you mean by promotion in this context. If
users want to simply promote the standby then there is no need to do
additional things that this tool is doing.

> However, if you are creating a logical replica, this WAL retention is not good
> and the customer should eventually remove these logical replication slots on
> the logical replica.
>

I think asking users to manually remove such slots won't be a good
idea. We might want to either remove them by default or provide an
option to the user.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 8:52 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Jan 1, 2024, at 7:14 AM, vignesh C wrote:
>
>
> 5) I felt the target server should be started before completion of
> pg_subscriber:
>
>
> Why?
>

Won't it be a better user experience that after setting up the target
server as a logical replica (subscriber), it started to work
seamlessly without user intervention?

> The initial version had an option to stop the subscriber. I decided to
> remove the option and stop the subscriber by default mainly because (1) it is
> an extra step to start the server (another point is that the WAL retention
> doesn't happen due to additional (synchronized?) replication slots on
> subscriber -- point 2). It was a conservative choice. If point 2 isn't an
> issue, imo point 1 is no big deal.
>

By point 2, do you mean to have a check for "max replication slots"?
It so, the one possibility is to even increase that config, if the
required max_replication_slots is low.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,
I was testing the patch with following test cases:

Test 1 :
- Create a 'primary' node
- Setup physical replica using pg_basebackup  "./pg_basebackup –h
localhost –X stream –v –R –W –D ../standby "
- Insert data before and after pg_basebackup
- Run pg_subscriber and then insert some data to check logical
replication "./pg_subscriber –D ../standby -S “host=localhost
port=9000 dbname=postgres” -P “host=localhost port=9000
dbname=postgres” -d postgres"
- Also check pg_publication, pg_subscriber and pg_replication_slots tables.

Observation:
Data is not lost. Replication is happening correctly. Pg_subscriber is
working as expected.

Test 2:
- Create a 'primary' node
- Use normal pg_basebackup but don’t set up Physical replication
"./pg_basebackup –h localhost –v –W –D ../standby"
- Insert data before and after pg_basebackup
- Run pg_subscriber

Observation:
Pg_subscriber command is not completing and is stuck with following
log repeating:
LOG: waiting for WAL to become available at 0/3000168
LOG: invalid record length at 0/3000150: expected at least 24, got 0

Test 3:
- Create a 'primary' node
- Use normal pg_basebackup but don’t set up Physical replication
"./pg_basebackup –h localhost –v –W –D ../standby"
-Insert data before pg_basebackup but not after pg_basebackup
-Run pg_subscriber

Observation:
Pg_subscriber command is not completing and is stuck with following
log repeating:
LOG: waiting for WAL to become available at 0/3000168
LOG: invalid record length at 0/3000150: expected at least 24, got 0

I was not clear about how to use pg_basebackup in this case, can you
let me know if any changes need to be made for test2 and test3.

Thanks and regards
Shlok Kyal



Re: speed up a logical replica setup

From
Ashutosh Bapat
Date:
On Wed, Jan 3, 2024 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>  c) Drop the replication slots d) Drop the
> > publications
> >
>
> I am not so sure about dropping publications because, unlike
> subscriptions which can start to pull the data, there is no harm with
> publications. Similar to publications there could be some user-defined
> functions or other other objects which may not be required once the
> standby replica is converted to subscriber. I guess we need to leave
> those to the user.
>

IIUC, primary use of pg_subscriber utility is to start a logical
subscription from a physical base backup (to reduce initial sync time)
as against logical backup taken while creating a subscription. Hence I
am expecting that apart from this difference, the resultant logical
replica should look similar to the logical replica setup using a
logical subscription sync. Hence we should not leave any replication
objects around. UDFs (views, and other objects) may have some use on a
logical replica. We may replicate changes to UDF once DDL replication
is supported. But what good is having the same publications as primary
also on logical replica?

--
Best Wishes,
Ashutosh Bapat



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 12:30 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >  c) Drop the replication slots d) Drop the
> > > publications
> > >
> >
> > I am not so sure about dropping publications because, unlike
> > subscriptions which can start to pull the data, there is no harm with
> > publications. Similar to publications there could be some user-defined
> > functions or other other objects which may not be required once the
> > standby replica is converted to subscriber. I guess we need to leave
> > those to the user.
> >
>
> IIUC, primary use of pg_subscriber utility is to start a logical
> subscription from a physical base backup (to reduce initial sync time)
> as against logical backup taken while creating a subscription. Hence I
> am expecting that apart from this difference, the resultant logical
> replica should look similar to the logical replica setup using a
> logical subscription sync. Hence we should not leave any replication
> objects around. UDFs (views, and other objects) may have some use on a
> logical replica. We may replicate changes to UDF once DDL replication
> is supported. But what good is having the same publications as primary
> also on logical replica?
>

The one use case that comes to my mind is to set up bi-directional
replication. The publishers want to subscribe to the new subscriber.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Hi,
> I was testing the patch with following test cases:
>
> Test 1 :
> - Create a 'primary' node
> - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> localhost –X stream –v –R –W –D ../standby "
> - Insert data before and after pg_basebackup
> - Run pg_subscriber and then insert some data to check logical
> replication "./pg_subscriber –D ../standby -S “host=localhost
> port=9000 dbname=postgres” -P “host=localhost port=9000
> dbname=postgres” -d postgres"
> - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
>
> Observation:
> Data is not lost. Replication is happening correctly. Pg_subscriber is
> working as expected.
>
> Test 2:
> - Create a 'primary' node
> - Use normal pg_basebackup but don’t set up Physical replication
> "./pg_basebackup –h localhost –v –W –D ../standby"
> - Insert data before and after pg_basebackup
> - Run pg_subscriber
>
> Observation:
> Pg_subscriber command is not completing and is stuck with following
> log repeating:
> LOG: waiting for WAL to become available at 0/3000168
> LOG: invalid record length at 0/3000150: expected at least 24, got 0
>

I think probably the required WAL is not copied. Can you use the -X
option to stream WAL as well and then test? But I feel in this case
also, we should wait for some threshold time and then exit with
failure, removing new objects created, if any.

> Test 3:
> - Create a 'primary' node
> - Use normal pg_basebackup but don’t set up Physical replication
> "./pg_basebackup –h localhost –v –W –D ../standby"
> -Insert data before pg_basebackup but not after pg_basebackup
> -Run pg_subscriber
>
> Observation:
> Pg_subscriber command is not completing and is stuck with following
> log repeating:
> LOG: waiting for WAL to become available at 0/3000168
> LOG: invalid record length at 0/3000150: expected at least 24, got 0
>

This is similar to the previous test and you can try the same option
here as well.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Ashutosh Bapat
Date:
On Thu, Jan 4, 2024 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > But what good is having the same publications as primary
> > also on logical replica?
> >
>
> The one use case that comes to my mind is to set up bi-directional
> replication. The publishers want to subscribe to the new subscriber.

Hmm. Looks like another user controlled cleanup.

--
Best Wishes,
Ashutosh Bapat



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Jan 4, 2024, at 2:41 AM, Amit Kapila wrote:
So, you also seem to be saying that it is not required once
pg_subscriber has promoted it. So, why it should be optional to remove
physical_replication_slot? I think we must remove it from the primary
unless there is some other reason.

My point is to *always* remove the primary_slot_name on primary.

My point is to not have an additional slot and keep a comment
indicating that we need an extra slot once we add pg_basebackup
support.

Got it.

> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
>
> I didn't read the current version of the referred patch but if the proposal is
> to synchronize logical replication slots iif you are using a physical
> replication, as soon as pg_subscriber finishes the execution, there won't be
> synchronization on these logical replication slots because there isn't a
> physical replication anymore. If the goal is a promotion, the current behavior
> is correct because the logical replica will retain WAL since it was converted.
>

I don't understand what you mean by promotion in this context. If
users want to simply promote the standby then there is no need to do
additional things that this tool is doing.

ENOCOFFEE. s/promotion/switchover/

> However, if you are creating a logical replica, this WAL retention is not good
> and the customer should eventually remove these logical replication slots on
> the logical replica.
>

I think asking users to manually remove such slots won't be a good
idea. We might want to either remove them by default or provide an
option to the user.

Am I correct that the majority of the use cases these replication slots will be
useless? If so, let's remove them by default and add an option to control this
behavior (replication slot removal).


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Jan 4, 2024, at 3:05 AM, Amit Kapila wrote:
Won't it be a better user experience that after setting up the target
server as a logical replica (subscriber), it started to work
seamlessly without user intervention?

If we have an option to control the replication slot removal (default is on),
it seems a good UI. Even if the user decides to disable the replication slot
removal, it should print a message saying that these replication slots can
cause WAL retention.

> The initial version had an option to stop the subscriber. I decided to
> remove the option and stop the subscriber by default mainly because (1) it is
> an extra step to start the server (another point is that the WAL retention
> doesn't happen due to additional (synchronized?) replication slots on
> subscriber -- point 2). It was a conservative choice. If point 2 isn't an
> issue, imo point 1 is no big deal.
>

By point 2, do you mean to have a check for "max replication slots"?
It so, the one possibility is to even increase that config, if the
required max_replication_slots is low.

By point 2, I mean WAL retention (sentence inside parenthesis).


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 9:18 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 4, 2024, at 2:41 AM, Amit Kapila wrote:
>
>
> I think asking users to manually remove such slots won't be a good
> idea. We might want to either remove them by default or provide an
> option to the user.
>
>
> Am I correct that the majority of the use cases these replication slots will be
> useless?
>

I am not so sure about it. Say, if some sync slots are present this
means the user wants this replica to be used later as a publisher.
Now, if the existing primary/publisher node is still alive then we
don't have these slots but if the user wants to switch over to this
new node as well then they may be required.

Is there a possibility that a cascading standby also has a slot on the
current physical replica being converted to a new subscriber?

> If so, let's remove them by default and add an option to control this
> behavior (replication slot removal).
>

The presence of slots on the physical replica indicates that the other
nodes/clusters could be dependent on it, so, I feel by default we
should give an error and if the user uses some option to remove slots
then it is fine to remove them.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 4, 2024 at 9:27 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 4, 2024, at 3:05 AM, Amit Kapila wrote:
>
> Won't it be a better user experience that after setting up the target
> server as a logical replica (subscriber), it started to work
> seamlessly without user intervention?
>
>
> If we have an option to control the replication slot removal (default is on),
> it seems a good UI. Even if the user decides to disable the replication slot
> removal, it should print a message saying that these replication slots can
> cause WAL retention.
>

As pointed out in the previous response, I think we should not proceed
with such a risk of WAL retention and other nodes dependency, we
should either give an ERROR (default) or remove slots, if the user
provides an option. If we do so, do you think by default we can keep
the server started or let the user start it later? I think one
advantage of letting the user start it later is that she gets a chance
to adjust config parameters in postgresql.conf and by default we won't
be using system resources.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
On Thu, 4 Jan 2024 at 16:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > Hi,
> > I was testing the patch with following test cases:
> >
> > Test 1 :
> > - Create a 'primary' node
> > - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> > localhost –X stream –v –R –W –D ../standby "
> > - Insert data before and after pg_basebackup
> > - Run pg_subscriber and then insert some data to check logical
> > replication "./pg_subscriber –D ../standby -S “host=localhost
> > port=9000 dbname=postgres” -P “host=localhost port=9000
> > dbname=postgres” -d postgres"
> > - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
> >
> > Observation:
> > Data is not lost. Replication is happening correctly. Pg_subscriber is
> > working as expected.
> >
> > Test 2:
> > - Create a 'primary' node
> > - Use normal pg_basebackup but don’t set up Physical replication
> > "./pg_basebackup –h localhost –v –W –D ../standby"
> > - Insert data before and after pg_basebackup
> > - Run pg_subscriber
> >
> > Observation:
> > Pg_subscriber command is not completing and is stuck with following
> > log repeating:
> > LOG: waiting for WAL to become available at 0/3000168
> > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> >
>
> I think probably the required WAL is not copied. Can you use the -X
> option to stream WAL as well and then test? But I feel in this case
> also, we should wait for some threshold time and then exit with
> failure, removing new objects created, if any.

I have tested with -X stream option in pg_basebackup as well. In this
case also the pg_subscriber command is getting stuck.
logs:
2024-01-05 11:49:34.436 IST [61948] LOG:  invalid resource manager ID
102 at 0/3000118
2024-01-05 11:49:34.436 IST [61948] LOG:  waiting for WAL to become
available at 0/3000130

>
> > Test 3:
> > - Create a 'primary' node
> > - Use normal pg_basebackup but don’t set up Physical replication
> > "./pg_basebackup –h localhost –v –W –D ../standby"
> > -Insert data before pg_basebackup but not after pg_basebackup
> > -Run pg_subscriber
> >
> > Observation:
> > Pg_subscriber command is not completing and is stuck with following
> > log repeating:
> > LOG: waiting for WAL to become available at 0/3000168
> > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> >
>
> This is similar to the previous test and you can try the same option
> here as well.
For this test as well tried with -X stream option  in pg_basebackup.
It is getting stuck here as well with similar log.

Will investigate the issue further.


Thanks and regards
Shlok Kyal



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

I love your proposal, so I want to join the review. Here are my first comments.

01.
Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
We want users to execute pg_subscriber on the target, right?

02.
When the application was executed, many outputs filled my screen. Some of them
were by pg_subscriber, and others were server log. Can we record them into
separated file? I imagined like pg_upgrade.

03.
A replication command is used when replication slots are created. Is there a
reason to use it? I think we do not have to use logical replication walsender mode,
we can use an SQL function instead. pg_create_logical_replication_slot() also outputs
LSN, isn't it sufficient?

04.
As you know, there are several options for publications/subscriptions/replication
slots. Do you have a good way to specify them in your mind?

05.
I found that the connection string for each subscriptions have a setting
"fallback_application_name=pg_subscriber". Can we remove it?

```
postgres=# SELECT subconninfo FROM pg_subscription;
                                   subconninfo
---------------------------------------------------------------------------------
 user=postgres port=5431 fallback_application_name=pg_subscriber dbname=postgres
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I love your proposal, so I want to join the review. Here are my first comments.
>
> 01.
> Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> We want users to execute pg_subscriber on the target, right?
>

I don't see any harm in users giving those information but we should
have some checks to ensure that the server is in standby mode and is
running locally. The other related point is do we need to take input
for the target cluster directory from the user? Can't we fetch that
information once we are connected to standby?

>
> 05.
> I found that the connection string for each subscriptions have a setting
> "fallback_application_name=pg_subscriber". Can we remove it?
>
> ```
> postgres=# SELECT subconninfo FROM pg_subscription;
>                                    subconninfo
> ---------------------------------------------------------------------------------
>  user=postgres port=5431 fallback_application_name=pg_subscriber dbname=postgres
> (1 row)
> ```

Can that help distinguish the pg_subscriber connection on the publisher?


--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > I love your proposal, so I want to join the review. Here are my first comments.
> >
> > 01.
> > Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> > We want users to execute pg_subscriber on the target, right?
> >
> 
> I don't see any harm in users giving those information but we should
> have some checks to ensure that the server is in standby mode and is
> running locally. The other related point is do we need to take input
> for the target cluster directory from the user? Can't we fetch that
> information once we are connected to standby?

I think that functions like inet_client_addr() may be able to use, but it returns
NULL only when the connection is via a Unix-domain socket. Can we restrict
pg_subscriber to use such a socket?

> >
> > 05.
> > I found that the connection string for each subscriptions have a setting
> > "fallback_application_name=pg_subscriber". Can we remove it?
> >
> > ```
> > postgres=# SELECT subconninfo FROM pg_subscription;
> >                                    subconninfo
> >
> ---------------------------------------------------------------------------------
> >  user=postgres port=5431 fallback_application_name=pg_subscriber
> dbname=postgres
> > (1 row)
> > ```
> 
> Can that help distinguish the pg_subscriber connection on the publisher?
>

Note that this connection string is used between the publisher instance and the
subscriber instance (not pg_subscriber client application). Also, the
fallback_application_name would be replaced to the name of subscriber in
run_apply_worker()->walrcv_connect(). Actually the value would not be used.
See below output on publisher.

```
publisher=# SELECT application_name, backend_type FROM pg_stat_activity where backend_type = 'walsender';
   application_name   | backend_type 
----------------------+--------------
 pg_subscriber_5_9411 | walsender
(1 row)
```

Or, if you mean to say that this can distinguish whether the subscription is used
by pg_subscriber or not. I think it is sufficient the current format of name.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Jan 8, 2024 at 12:35 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > I love your proposal, so I want to join the review. Here are my first comments.
> > >
> > > 01.
> > > Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> > > We want users to execute pg_subscriber on the target, right?
> > >
> >
> > I don't see any harm in users giving those information but we should
> > have some checks to ensure that the server is in standby mode and is
> > running locally. The other related point is do we need to take input
> > for the target cluster directory from the user? Can't we fetch that
> > information once we are connected to standby?
>
> I think that functions like inet_client_addr() may be able to use, but it returns
> NULL only when the connection is via a Unix-domain socket. Can we restrict
> pg_subscriber to use such a socket?
>

Good question. So, IIUC, this tool has a requirement to run locally
where standby is present because we want to write reconvery.conf file.
I am not sure if it is a good idea to have a restriction to use only
the unix domain socket as users need to set up the standby for that by
configuring unix_socket_directories. It is fine if we can't ensure
that it is running locally but we should at least ensure that the
server is a physical standby node to avoid the problems as Shlok has
reported.

On a related point, I see that the patch stops the standby server (if
it is running) before starting with subscriber-side steps. I was
wondering if users can object to it that there was some important data
replication in progress which this tool has stopped. Now, OTOH,
anyway, once the user uses pg_subscriber, the standby server will be
converted to a subscriber, so it may not be useful as a physical
replica. Do you or others have any thoughts on this matter?

> > >
> > > 05.
> > > I found that the connection string for each subscriptions have a setting
> > > "fallback_application_name=pg_subscriber". Can we remove it?
> > >
> > > ```
> > > postgres=# SELECT subconninfo FROM pg_subscription;
> > >                                    subconninfo
> > >
> > ---------------------------------------------------------------------------------
> > >  user=postgres port=5431 fallback_application_name=pg_subscriber
> > dbname=postgres
> > > (1 row)
> > > ```
> >
> > Can that help distinguish the pg_subscriber connection on the publisher?
> >
>
> Note that this connection string is used between the publisher instance and the
> subscriber instance (not pg_subscriber client application). Also, the
> fallback_application_name would be replaced to the name of subscriber in
> run_apply_worker()->walrcv_connect(). Actually the value would not be used.
>

Fair point. It is not clear what other purpose this can achieve,
probably Euler has something in mind for this.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> > > I don't see any harm in users giving those information but we should
> > > have some checks to ensure that the server is in standby mode and is
> > > running locally. The other related point is do we need to take input
> > > for the target cluster directory from the user? Can't we fetch that
> > > information once we are connected to standby?
> >
> > I think that functions like inet_client_addr() may be able to use, but it returns
> > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > pg_subscriber to use such a socket?
> >
> 
> Good question. So, IIUC, this tool has a requirement to run locally
> where standby is present because we want to write reconvery.conf file.
> I am not sure if it is a good idea to have a restriction to use only
> the unix domain socket as users need to set up the standby for that by
> configuring unix_socket_directories. It is fine if we can't ensure
> that it is running locally but we should at least ensure that the
> server is a physical standby node to avoid the problems as Shlok has
> reported.

While thinking more about it, I found that we did not define the policy
whether user must not connect to the target while running pg_subscriber. What
should be? If it should be avoided, some parameters like listen_addresses and
unix_socket_permissions should be restricted like start_postmaster() in
pg_upgrade/server.c. Also, the port number should be changed to another value
as well.

Personally, I vote to reject connections during the pg_subscriber.

> On a related point, I see that the patch stops the standby server (if
> it is running) before starting with subscriber-side steps. I was
> wondering if users can object to it that there was some important data
> replication in progress which this tool has stopped. Now, OTOH,
> anyway, once the user uses pg_subscriber, the standby server will be
> converted to a subscriber, so it may not be useful as a physical
> replica. Do you or others have any thoughts on this matter?

I assumed that connections should be closed before running pg_subscriber. If so,
it may be better to just fail the command when the physical standby has already
been started. There is no answer  whether data replication and user queries
should stop. Users should choose the stop option based on their policy and then
pg_subscriebr can start postmaster.
pg_upgrade does the same thing in setup().

====

Further comment:
According to the doc, currently pg_subscriber is listed in the client application.
But based on the definition, I felt it should be at "PostgreSQL Server Applications"
page. How do you think? The definition is:

>
This part contains reference information for PostgreSQL server applications and
support utilities. These commands can only be run usefully on the host where the
database server resides. Other utility programs are listed in PostgreSQL Client
Applications.
>

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Tue, Jan 9, 2024 at 12:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > > > I don't see any harm in users giving those information but we should
> > > > have some checks to ensure that the server is in standby mode and is
> > > > running locally. The other related point is do we need to take input
> > > > for the target cluster directory from the user? Can't we fetch that
> > > > information once we are connected to standby?
> > >
> > > I think that functions like inet_client_addr() may be able to use, but it returns
> > > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > > pg_subscriber to use such a socket?
> > >
> >
> > Good question. So, IIUC, this tool has a requirement to run locally
> > where standby is present because we want to write reconvery.conf file.
> > I am not sure if it is a good idea to have a restriction to use only
> > the unix domain socket as users need to set up the standby for that by
> > configuring unix_socket_directories. It is fine if we can't ensure
> > that it is running locally but we should at least ensure that the
> > server is a physical standby node to avoid the problems as Shlok has
> > reported.
>
> While thinking more about it, I found that we did not define the policy
> whether user must not connect to the target while running pg_subscriber. What
> should be? If it should be avoided, some parameters like listen_addresses and
> unix_socket_permissions should be restricted like start_postmaster() in
> pg_upgrade/server.c.
>

Yeah, this makes sense to me.

> Also, the port number should be changed to another value
> as well.
>

Fair point, but I think in that case we should take this as one of the
parameters.

> Personally, I vote to reject connections during the pg_subscriber.
>
> > On a related point, I see that the patch stops the standby server (if
> > it is running) before starting with subscriber-side steps. I was
> > wondering if users can object to it that there was some important data
> > replication in progress which this tool has stopped. Now, OTOH,
> > anyway, once the user uses pg_subscriber, the standby server will be
> > converted to a subscriber, so it may not be useful as a physical
> > replica. Do you or others have any thoughts on this matter?
>
> I assumed that connections should be closed before running pg_subscriber. If so,
> it may be better to just fail the command when the physical standby has already
> been started. There is no answer  whether data replication and user queries
> should stop. Users should choose the stop option based on their policy and then
> pg_subscriebr can start postmaster.
> pg_upgrade does the same thing in setup().
>

Agreed.

> ====
>
> Further comment:
> According to the doc, currently pg_subscriber is listed in the client application.
> But based on the definition, I felt it should be at "PostgreSQL Server Applications"
> page. How do you think?
>

I also think it should be a server application.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
On Fri, 5 Jan 2024 at 12:19, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 4 Jan 2024 at 16:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Hi,
> > > I was testing the patch with following test cases:
> > >
> > > Test 1 :
> > > - Create a 'primary' node
> > > - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> > > localhost –X stream –v –R –W –D ../standby "
> > > - Insert data before and after pg_basebackup
> > > - Run pg_subscriber and then insert some data to check logical
> > > replication "./pg_subscriber –D ../standby -S “host=localhost
> > > port=9000 dbname=postgres” -P “host=localhost port=9000
> > > dbname=postgres” -d postgres"
> > > - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
> > >
> > > Observation:
> > > Data is not lost. Replication is happening correctly. Pg_subscriber is
> > > working as expected.
> > >
> > > Test 2:
> > > - Create a 'primary' node
> > > - Use normal pg_basebackup but don’t set up Physical replication
> > > "./pg_basebackup –h localhost –v –W –D ../standby"
> > > - Insert data before and after pg_basebackup
> > > - Run pg_subscriber
> > >
> > > Observation:
> > > Pg_subscriber command is not completing and is stuck with following
> > > log repeating:
> > > LOG: waiting for WAL to become available at 0/3000168
> > > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> > >
> >
> > I think probably the required WAL is not copied. Can you use the -X
> > option to stream WAL as well and then test? But I feel in this case
> > also, we should wait for some threshold time and then exit with
> > failure, removing new objects created, if any.
>
> I have tested with -X stream option in pg_basebackup as well. In this
> case also the pg_subscriber command is getting stuck.
> logs:
> 2024-01-05 11:49:34.436 IST [61948] LOG:  invalid resource manager ID
> 102 at 0/3000118
> 2024-01-05 11:49:34.436 IST [61948] LOG:  waiting for WAL to become
> available at 0/3000130
>
> >
> > > Test 3:
> > > - Create a 'primary' node
> > > - Use normal pg_basebackup but don’t set up Physical replication
> > > "./pg_basebackup –h localhost –v –W –D ../standby"
> > > -Insert data before pg_basebackup but not after pg_basebackup
> > > -Run pg_subscriber
> > >
> > > Observation:
> > > Pg_subscriber command is not completing and is stuck with following
> > > log repeating:
> > > LOG: waiting for WAL to become available at 0/3000168
> > > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> > >
> >
> > This is similar to the previous test and you can try the same option
> > here as well.
> For this test as well tried with -X stream option  in pg_basebackup.
> It is getting stuck here as well with similar log.
>
> Will investigate the issue further.

I noticed that the pg_subscriber get stuck when we run it on node
which is not a standby. It is because the of the code:
+   conn = connect_database(dbinfo[0].pubconninfo);
+   if (conn == NULL)
+       exit(1);
+   consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0],
+                                                    temp_replslot);
+
.....
+else
+   {
+       appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n",
+                         consistent_lsn);
+       WriteRecoveryConfig(conn, subscriber_dir, recoveryconfcontents);
+   }

Here the standby node would be waiting for the 'consistent_lsn' wal
during recovery but this wal will not be present on standby if no
physical replication is setup. Hence the command will be waiting
infinitely for the wal.
To solve this added a timeout of 60s for the recovery process and also
added a check so that pg_subscriber would give a error when it called
for node which is not in physical replication.
Have attached the patch for the same. It is a top-up patch of the
patch shared by Euler at [1].

Please review the changes and merge the changes if it looks ok.

[1] - https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com

Thanks and regards
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.

Few comments:
1) We should not allow specifying the same database name twice as we
will try to create the slots multiple times in the publisher, this can
be detected while parsing the options and error can be thrown:
+                       case 'd':
+
simple_string_list_append(&database_names, optarg);
+                               num_dbs++;
+                               break;

+static bool
+create_all_logical_replication_slots(LogicalRepInfo *dbinfo)
+{
+       int                     i;
+
+       for (i = 0; i < num_dbs; i++)
+       {
+               PGconn     *conn;
+               PGresult   *res;
+               char            replslotname[NAMEDATALEN];
....
....
....
+               /* Create replication slot on publisher. */
+               if (create_logical_replication_slot(conn, &dbinfo[i],
replslotname) != NULL || dry_run)
+                       pg_log_info("create replication slot \"%s\" on
publisher", replslotname);
+               else
+                       return false;
+
+               disconnect_database(conn);
+       }

E.g.: pg_subscriber -d postgres -d postgres

2) 2023 should be changed to 2024
+/*-------------------------------------------------------------------------
+ *
+ * pg_subscriber.c
+ *       Create a new logical replica from a standby server
+ *
+ * Copyright (C) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *             src/bin/pg_subscriber/pg_subscriber.c
+ *
+ *-------------------------------------------------------------------------
+ */

3) Similarly here too:
diff --git a/src/bin/pg_basebackup/t/040_pg_subscriber.pl
b/src/bin/pg_basebackup/t/040_pg_subscriber.pl
new file mode 100644
index 0000000000..9d20847dc2
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_subscriber.pl
@@ -0,0 +1,44 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+#
+# Test checking options of pg_subscriber.
+#

4) Similarly here too:
diff --git a/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
b/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
new file mode 100644
index 0000000000..ce25608c68
--- /dev/null
+++ b/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+#
+# Test using a standby server as the subscriber.

Regards,
Vignesh



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Jan 10, 2024, at 1:33 AM, Shlok Kyal wrote:
Here the standby node would be waiting for the 'consistent_lsn' wal
during recovery but this wal will not be present on standby if no
physical replication is setup. Hence the command will be waiting
infinitely for the wal.

Hmm. Some validations are missing.

To solve this added a timeout of 60s for the recovery process and also
added a check so that pg_subscriber would give a error when it called
for node which is not in physical replication.
Have attached the patch for the same. It is a top-up patch of the
patch shared by Euler at [1].

If the user has a node that is not a standby and it does not set the GUCs to
start the recovery process from a backup, the initial setup is broken. (That's
the case you described.) A good UI is to detect this scenario earlier.
Unfortunately, there isn't a reliable and cheap way to do it. You need to start
the recovery and check if it is having some progress. (I don't have a strong
opinion about requiring a standby to use this tool. It would reduce the
complexity about checking if the setup has all requirements to run this tool.)


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 11, 2024 at 7:59 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 10, 2024, at 1:33 AM, Shlok Kyal wrote:
>
> Here the standby node would be waiting for the 'consistent_lsn' wal
> during recovery but this wal will not be present on standby if no
> physical replication is setup. Hence the command will be waiting
> infinitely for the wal.
>
>
> Hmm. Some validations are missing.
>
> To solve this added a timeout of 60s for the recovery process and also
> added a check so that pg_subscriber would give a error when it called
> for node which is not in physical replication.
> Have attached the patch for the same. It is a top-up patch of the
> patch shared by Euler at [1].
>
>
> If the user has a node that is not a standby and it does not set the GUCs to
> start the recovery process from a backup, the initial setup is broken. (That's
> the case you described.) A good UI is to detect this scenario earlier.
> Unfortunately, there isn't a reliable and cheap way to do it. You need to start
> the recovery and check if it is having some progress. (I don't have a strong
> opinion about requiring a standby to use this tool. It would reduce the
> complexity about checking if the setup has all requirements to run this tool.)
>

Right, such a check will reduce some complexity. So, +1 for the check
as proposed by Shlok. Also, what are your thoughts on a timeout during
the wait? I think it is okay to wait for 60s by default but there
should be an option for users to wait for longer.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.
I also tried to address some comments which is not so claimed by others.
They were included in 0003 patch.

* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

* 0002 patch
This contains small fixes to keep complier quiet.

* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

SOURCE: [1]

2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

SOURCE: [2]

3. 
Options -o/-O were added to specify options for publications/subscriptions.

SOURCE: [2]

4. 
Made standby to save their output to log file.

SOURCE: [2]

5. 
Unnecessary Assert in drop_replication_slot() was removed.

SOURCE: [3]

How do you think?
Thanks Shlok and Vignesh to work with me offline.

[1]:
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CALDaNm098Jkbh%2Bye6zMj9Ro9j1bBe6FfPV80BFbs1%3DpUuTJ07g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.

Let me send an updated patch to hopefully keep the CF bot happy. The following
items are included in this patch:

* drop physical replication slot if standby is using one [1].
* cleanup small changes (copyright, .gitignore) [2][3]
* fix getopt_long() options [2]
* fix format specifier for some messages
* move doc to Server Application section [4]
* fix assert failure
* ignore duplicate database names [2]
* store subscriber server log into a separate file
* remove MSVC support

I'm still addressing other reviews and I'll post another version that includes
it soon.



--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:

I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.
I also tried to address some comments which is not so claimed by others.
They were included in 0003 patch.

[I removed the following part in the previous email and couldn't reply to it...]

* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

v5 removes the MSVC support.

* 0002 patch
This contains small fixes to keep complier quiet.

I applied it. Although, I used a different approach for format specifier.

* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

I didn't review all items but ...

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

My first reaction as a new user would be: why do I need to specify a port if my
--subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
it behind the scenes. Try a range of ports.

2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

Extra protection is always good. However, let's make sure this code path is
really useful. I'll think a bit about it.

3. 
Options -o/-O were added to specify options for publications/subscriptions.

Flexibility is cool. However, I think the cost benefit of it is not good. You
have to parse the options to catch preliminary errors. Things like publish only
delete and subscription options that conflicts with the embedded ones are
additional sources of failure.

4. 
Made standby to save their output to log file.

It was already done in v5. I did in a different way.

5. 
Unnecessary Assert in drop_replication_slot() was removed.

Instead, I fixed the code and keep the assert.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Jan 12, 2024 at 4:30 AM Euler Taveira <euler@eulerto.com> wrote:
>
>
> 3.
> Options -o/-O were added to specify options for publications/subscriptions.
>
>
> Flexibility is cool. However, I think the cost benefit of it is not good. You
> have to parse the options to catch preliminary errors. Things like publish only
> delete and subscription options that conflicts with the embedded ones are
> additional sources of failure.
>

Yeah, I am also not sure we need those. Did we discussed about that
previously? OTOH, we may consider to enhance this tool later if we
have user demand for such options.

BTW, I think we need some way to at least drop the existing
subscriptions otherwise the newly created subscriber will attempt to
fetch the data which may not be intended. Ashutosh made an argument
above thread that we need an option for publications as well.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Euler,

> >
> >
> > 3.
> > Options -o/-O were added to specify options for publications/subscriptions.
> >
> >
> > Flexibility is cool. However, I think the cost benefit of it is not good. You
> > have to parse the options to catch preliminary errors. Things like publish only
> > delete and subscription options that conflicts with the embedded ones are
> > additional sources of failure.
> >
> 
> Yeah, I am also not sure we need those. Did we discussed about that
> previously? OTOH, we may consider to enhance this tool later if we
> have user demand for such options.

OK, so let's drop it once and consider later as an enhancement.
As Euler said, it leads additional ERRORs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Sorry for disturbing your work and thanks for updates.
I will review your patch again.

>
* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

v5 removes the MSVC support.
>

Confirmed that the patch could be applied.

>
* 0002 patch
This contains small fixes to keep complier quiet.

I applied it. Although, I used a different approach for format specifier.
>

Good, all warnings were removed. However, the patch failed to pass tests on FreeBSD twice.
 I'm quite not sure the ERROR, but is it related with us?

>
* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

I didn't review all items but ...

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

My first reaction as a new user would be: why do I need to specify a port if my
--subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
it behind the scenes. Try a range of ports.
>

My initial motivation of the setting was to avoid establishing connections
during the pg_subscriber. While considering more, I started to think that
--subscriber-conninfo may not be needed. pg_upgrade does not requires the
string: it requries username, and optionally port number (which would be used
during the upgrade) instead. The advantage of this approach is that we do not
have to parse the connection string.
How do you think?

>
2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

Extra protection is always good. However, let's make sure this code path is
really useful. I'll think a bit about it.
>

OK, I can wait your consideration. Note that if we follow the pg_ugprade style,
we may able to reuse check_pghost_envvar().

>
3.
Options -o/-O were added to specify options for publications/subscriptions.

Flexibility is cool. However, I think the cost benefit of it is not good. You
have to parse the options to catch preliminary errors. Things like publish only
delete and subscription options that conflicts with the embedded ones are
additional sources of failure.
>

As I already replied, let's stop doing it once. We can resume based on the requirement.

>
4.
Made standby to save their output to log file.

It was already done in v5. I did in a different way.
>

Good. I felt that yours were better. BTW, can we record outputs by pg_subscriber to a file as well?
pg_upgrade did similar thing. Thought?

>
5.
Unnecessary Assert in drop_replication_slot() was removed.

Instead, I fixed the code and keep the assert.
>

Cool.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Here are comments for your v5 patch.

01.
In the document, two words target/standby are used as almost the same meaning.
Can you unify them?

02.
```
 <refsynopsisdiv>
  <cmdsynopsis>
   <command>pg_subscriber</command>
   <arg rep="repeat"><replaceable>option</replaceable></arg>
  </cmdsynopsis>
 </refsynopsisdiv>
```

There are some mandatory options like -D/-S/-P. It must be listed in Synopsis chapter.

03.
```
  <para>
   <application>pg_subscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a standby server and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>
```

I briefly checked other pages and they do not describe accepted options here.
A summary of the application should be mentioned. Based on that, how about:

```
pg_subscriber creates a new <link linkend="logical-replication-subscription">
subscriber</link> from a physical standby server. This allows users to quickly
set up logical replication system.
```

04.
```
  <para>
   The <application>pg_subscriber</application> should be run at the target
   server. The source server (known as publisher server) should accept logical
   replication connections from the target server (known as subscriber server).
   The target server should accept local logical replication connection.
  </para>
```

I'm not native speaker, but they are not just recommmendations - they are surely
required. So, should we replace s/should/has to/?

05.
```
     <varlistentry>
      <term><option>-S <replaceable class="parameter">conninfo</replaceable></option></term>
      <term><option>--subscriber-conninfo=<replaceable class="parameter">conninfo</replaceable></option></term>
      <listitem>
       <para>
        The connection string to the subscriber. For details see <xref linkend="libpq-connstring"/>.
       </para>
      </listitem>
     </varlistentry>
```

I became not sure whether it is "The connection string to the subscriber.".
The server is still physical standby at that time.

06.
```
 * IDENTIFICATION
 *        src/bin/pg_subscriber/pg_subscriber.c
```

The identification is not correct.

07.
I felt that there were too many global variables and LogicalRepInfo should be
refactored. Because...

* Some info related with clusters(e.g., subscriber_dir, conninfo, ...) should be
  gathered in one struct.
* pubconninfo/subsconninfo are stored per db, but it is not needed if we have
  one base_conninfo.
* pubname/subname are not needed because we have fixed naming rule.
* pg_ctl_path and pg_resetwal_path can be conbimed into one bindir.
* num_dbs should not be alone.
...

Based on above, how about using structures like below?

```
typedef struct LogicalRepPerdbInfo
{
    Oid        oid;
    char   *dbname;
    bool    made_replslot;    /* replication slot was created */
    bool    made_publication;    /* publication was created */
    bool    made_subscription;    /* subscription was created */
} LogicalRepPerdbInfo;

typedef struct PrimaryInfo
{
    char   *base_conninfo;
    bool    made_transient_replslot;
} PrimaryInfo;

typedef struct StandbyInfo
{
    char   *base_conninfo;
    char   *bindir;
    char   *pgdata;
    char   *primary_slot_name;
} StandbyInfo;

typedef struct LogicalRepInfo
{
    int                        num_dbs;
    LogicalRepPerdbInfo       *perdb;
    PrimaryInfo               *primary;
    StandbyInfo               *standby;
} LogicalRepInfo;
```

08.
```
    char       *subconninfo;    /* subscription connection string for logical
                                 * replication */
```

Not sure how we should notate because the target has not been subscriber yet.

09.
```
enum WaitPMResult
{
    POSTMASTER_READY,
    POSTMASTER_STANDBY,
    POSTMASTER_STILL_STARTING,
    POSTMASTER_FAILED
};
```

This enum has been already defined in pg_ctl.c. Not sure we can use the same name.
Can we rename to PGSWaitPMResult. or export pre-existing one?

10.
```
/* Options */
static const char *progname;
```

I think it is not an option.

11.
```
/*
 * Validate a connection string. Returns a base connection string that is a
 * connection string without a database name plus a fallback application name.
 * Since we might process multiple databases, each database name will be
 * appended to this base connection string to provide a final connection string.
 * If the second argument (dbname) is not null, returns dbname if the provided
 * connection string contains it. If option --database is not provided, uses
 * dbname as the only database to setup the logical replica.
 * It is the caller's responsibility to free the returned connection string and
 * dbname.
 */
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
```
Just FYI - adding fallback_application_name may be too optimisitic. Currently
the output was used by both pg_subscriber and subscription connection.

12.
Can we add an option not to remove log files even operations were succeeded.

13.
```
        /*
         * Since the standby server is running, check if it is using an
         * existing replication slot for WAL retention purposes. This
         * replication slot has no use after the transformation, hence, it
         * will be removed at the end of this process.
         */
        primary_slot_name = use_primary_slot_name();
```

Now primary_slot_name is checked only when the server have been started, but
it should be checked in any cases.

14.
```
    consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0],
                                                     temp_replslot);
```

Can we create a temporary slot here?

15.
I found that subscriptions cannot be started if tuples are inserted on publisher
after creating temp_replslot. After starting a subscriber, I got below output on the log.

```
ERROR:  could not receive data from WAL stream: ERROR:  publication "pg_subscriber_5" does not exist
CONTEXT:  slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change callback, associated LSN 0/30008A8
LOG:  background worker "logical replication apply worker" (PID 3669) exited with exit code 1
```

But this is strange. I confirmed that the specified publication surely exists.
Do you know the reason?

```
publisher=# SELECT pubname FROM pg_publication;
     pubname
-----------------
 pg_subscriber_5
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: speed up a logical replica setup

From
Nazir Bilal Yavuz
Date:
Hi,

On Fri, 12 Jan 2024 at 09:32, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Good, all warnings were removed. However, the patch failed to pass tests on FreeBSD twice.
>  I'm quite not sure the ERROR, but is it related with us?

FreeBSD errors started after FreeBSD's CI image was updated [1]. I do
not think error is related to this.

[1] https://cirrus-ci.com/task/4700394639589376

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Jan 12, 2024 at 12:02 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I didn't review all items but ...
>
> 1.
> An option --port was added to control the port number for physical standby.
> Users can specify a port number via the option, or an environment variable PGSUBPORT.
> If not specified, a fixed value (50111) would be used.
>
> My first reaction as a new user would be: why do I need to specify a port if my
> --subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
> it behind the scenes. Try a range of ports.
> >
>
> My initial motivation of the setting was to avoid establishing connections
> during the pg_subscriber. While considering more, I started to think that
> --subscriber-conninfo may not be needed. pg_upgrade does not requires the
> string: it requries username, and optionally port number (which would be used
> during the upgrade) instead. The advantage of this approach is that we do not
> have to parse the connection string.
> How do you think?
>

+1. This seems worth considering. I think unless we have a good reason
to have this parameter, we should try to avoid it.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Junwang Zhao
Date:
Hi Euler,

On Fri, Jan 12, 2024 at 6:16 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
>
> I have been concerned that the patch has not been tested by cfbot due to the
> application error. Also, some comments were raised. Therefore, I created a patch
> to move forward.
>
>
> Let me send an updated patch to hopefully keep the CF bot happy. The following
> items are included in this patch:
>
> * drop physical replication slot if standby is using one [1].
> * cleanup small changes (copyright, .gitignore) [2][3]
> * fix getopt_long() options [2]
> * fix format specifier for some messages
> * move doc to Server Application section [4]
> * fix assert failure
> * ignore duplicate database names [2]
> * store subscriber server log into a separate file
> * remove MSVC support
>
> I'm still addressing other reviews and I'll post another version that includes
> it soon.
>
> [1] https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com
> [2] https://www.postgresql.org/message-id/CALDaNm1joke42n68LdegN5wCpaeoOMex2EHcdZrVZnGD3UhfNQ%40mail.gmail.com
> [3]
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [4]
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>

+ <refnamediv>
+  <refname>pg_subscriber</refname>
+  <refpurpose>create a new logical replica from a standby server</refpurpose>
+ </refnamediv>
I'm a bit confused about this wording because we are converting a standby
to a logical replica but not creating a new logical replica and leaving the
standby as is. How about:

<refpurpose>convert a standby replica to a logical replica</refpurpose>

+  <para>
+   The <application>pg_subscriber</application> should be run at the target
+   server. The source server (known as publisher server) should accept logical
+   replication connections from the target server (known as subscriber server).
+   The target server should accept local logical replication connection.
+  </para>

What is *local logical replication*? I can't find any clue in the patch, can you
give me some hint?


>
> --
> Euler Taveira
> EDB   https://www.enterprisedb.com/
>


--
Regards
Junwang Zhao



Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> >
> > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > On 08.11.23 00:12, Michael Paquier wrote:
> > >> - Should the subdirectory pg_basebackup be renamed into something more
> > >> generic at this point?  All these things are frontend tools that deal
> > >> in some way with the replication protocol to do their work.  Say
> > >> a replication_tools?
> > >
> > > Seems like unnecessary churn.  Nobody has complained about any of the other
> > > tools in there.
> >
> > Not sure.  We rename things across releases in the tree from time to
> > time, and here that's straight-forward.
> >
> >
> > Based on this discussion it seems we have a consensus that this tool should be
> > in the pg_basebackup directory. (If/when we agree with the directory renaming,
> > it could be done in a separate patch.) Besides this move, the v3 provides a dry
> > run mode. It basically executes every routine but skip when should do
> > modifications. It is an useful option to check if you will be able to run it
> > without having issues with connectivity, permission, and existing objects
> > (replication slots, publications, subscriptions). Tests were slightly improved.
> > Messages were changed to *not* provide INFO messages by default and --verbose
> > provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> > also refactored the connect_database() function into which the connection will
> > always use the logical replication mode. A bug was fixed in the transient
> > replication slot name. Ashutosh review [1] was included. The code was also indented.
> >
> > There are a few suggestions from Ashutosh [2] that I will reply in another
> > email.
> >
> > I'm still planning to work on the following points:
> >
> > 1. improve the cleanup routine to point out leftover objects if there is any
> >    connection issue.
> >
>
> I think this is an important part. Shall we try to write to some file
> the pending objects to be cleaned up? We do something like that during
> the upgrade.
>
> > 2. remove the physical replication slot if the standby is using one
> >    (primary_slot_name).
> > 3. provide instructions to promote the logical replica into primary, I mean,
> >    stop the replication between the nodes and remove the replication setup
> >    (publications, subscriptions, replication slots). Or even include another
> >    action to do it. We could add both too.
> >
> > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > UI for users that would like to use it.
> >
>
> Isn't point 2 also essential because how would otherwise such a slot
> be advanced or removed?
>
> A few other points:
> ==============
> 1. Previously, I asked whether we need an additional replication slot
> patch created to get consistent LSN and I see the following comment in
> the patch:
>
> + *
> + * XXX we should probably use the last created replication slot to get a
> + * consistent LSN but it should be changed after adding pg_basebackup
> + * support.
>
> Yeah, sure, we may want to do that after backup support and we can
> keep a comment for the same but I feel as the patch stands today,
> there is no good reason to keep it. Also, is there a reason that we
> can't create the slots after backup is complete and before we write
> recovery parameters
>
> 2.
> + appendPQExpBuffer(str,
> +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
>
> Shouldn't we enable two_phase by default for newly created
> subscriptions? Is there a reason for not doing so?
>
> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
> 4. Can we see some numbers with various sizes of databases (cluster)
> to see how it impacts the time for small to large-size databases as
> compared to the traditional method? This might help us with giving
> users advice on when to use this tool. We can do this bit later as
> well when the patch is closer to being ready for commit.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 100MB, 1GB and 5GB data:
                            | 100MB | 1GB      | 5GB
Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
pg_subscriber     | 3.977s  | 9.988s   | 12.665s

Here, 'w' stands for 'workers'. I have included the tests to see the
test result variations with different values for
'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
tests for different data records; for 100MB I put  3,00,000 Records,
for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
It is observed that 'pg_subscriber' is better when the table size is
more.
Next I plan to run these tests for 10GB and 20GB to see if this trend
continues or not.
Attaching the script files which have the details of the test scripts
used and the excel file has the test run details. The
'pg_subscriber.pl' file is to test with 'pg_subscriber' and the
'logical_rep.pl' file is to test with 'Logical Replication'.

Thanks and Regards,
Shubham Khanna.

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler, hackers,

I found that some bugs which have been reported by Shlok were not fixed, so
made a top-up patch. 0001 was not changed, and 0002 contains below:

* Add a timeout option for the recovery option, per [1]. The code was basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
  (not sure it is really needed, so feel free reject the part.)

Feel free to merge parts of 0002 if it looks good to you.
Thanks Shlok to make a part of patch.

[1]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

I have some comments for the v5 patch:

1.
```
+ base_dir = (char *) pg_malloc0(MAXPGPATH);
+ len = snprintf(base_dir, MAXPGPATH, "%s/%s", subscriber_dir, PGS_OUTPUT_DIR);
```
Before these lines, I think we should use
'canonicalize_path(subscriber_dir)' to remove extra unnecessary
characters. This function is used in many places like initdb.c,
pg_ctl.c, pg_basebakup.c, etc

2.
I also feels that there are many global variables and can be arranged
as structures as suggested by Kuroda-san in [1]

[1]
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Thanks and Regards
Shlok Kyal


On Fri, 12 Jan 2024 at 03:46, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
>
> I have been concerned that the patch has not been tested by cfbot due to the
> application error. Also, some comments were raised. Therefore, I created a patch
> to move forward.
>
>
> Let me send an updated patch to hopefully keep the CF bot happy. The following
> items are included in this patch:
>
> * drop physical replication slot if standby is using one [1].
> * cleanup small changes (copyright, .gitignore) [2][3]
> * fix getopt_long() options [2]
> * fix format specifier for some messages
> * move doc to Server Application section [4]
> * fix assert failure
> * ignore duplicate database names [2]
> * store subscriber server log into a separate file
> * remove MSVC support
>
> I'm still addressing other reviews and I'll post another version that includes
> it soon.
>
> [1] https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com
> [2] https://www.postgresql.org/message-id/CALDaNm1joke42n68LdegN5wCpaeoOMex2EHcdZrVZnGD3UhfNQ%40mail.gmail.com
> [3]
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [4]
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>
>
> --
> Euler Taveira
> EDB   https://www.enterprisedb.com/
>



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 11.01.24 23:15, Euler Taveira wrote:
> A new tool called pg_subscriber can convert a physical replica into a 
> logical replica. It runs on the target server and should be able to 
> connect to the source server (publisher) and the target server (subscriber).

Can we have a discussion on the name?

I find the name pg_subscriber too general.

The replication/backup/recovery tools in PostgreSQL are usually named 
along the lines of "verb - object".  (Otherwise, they would all be 
called "pg_backup"??)  Moreover, "pg_subscriber" also sounds like the 
name of the program that runs the subscriber itself, like what the 
walreceiver does now.

Very early in this thread, someone mentioned the name 
pg_create_subscriber, and of course there is pglogical_create_subscriber 
as the historical predecessor.  Something along those lines seems better 
to me.  Maybe there are other ideas.




Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Jan 18, 2024 at 2:49 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 11.01.24 23:15, Euler Taveira wrote:
> > A new tool called pg_subscriber can convert a physical replica into a
> > logical replica. It runs on the target server and should be able to
> > connect to the source server (publisher) and the target server (subscriber).
>
> Can we have a discussion on the name?
>
> I find the name pg_subscriber too general.
>
> The replication/backup/recovery tools in PostgreSQL are usually named
> along the lines of "verb - object".  (Otherwise, they would all be
> called "pg_backup"??)  Moreover, "pg_subscriber" also sounds like the
> name of the program that runs the subscriber itself, like what the
> walreceiver does now.
>
> Very early in this thread, someone mentioned the name
> pg_create_subscriber, and of course there is pglogical_create_subscriber
> as the historical predecessor.  Something along those lines seems better
> to me.  Maybe there are other ideas.
>

The other option could be pg_createsubscriber on the lines of
pg_verifybackup and pg_combinebackup. Yet other options could be
pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
sounds like we are doing some work to create the subscriber which I
think is the case here.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Tue, Jan 16, 2024 at 11:58 AM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > 4. Can we see some numbers with various sizes of databases (cluster)
> > to see how it impacts the time for small to large-size databases as
> > compared to the traditional method? This might help us with giving
> > users advice on when to use this tool. We can do this bit later as
> > well when the patch is closer to being ready for commit.
>
> I have done the Performance testing and attached the results to
> compare the 'Execution Time' between 'logical replication' and
> 'pg_subscriber' for 100MB, 1GB and 5GB data:
>                             | 100MB | 1GB      | 5GB
> Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
> Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
> Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
> Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
> pg_subscriber     | 3.977s  | 9.988s   | 12.665s
>
> Here, 'w' stands for 'workers'. I have included the tests to see the
> test result variations with different values for
> 'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
> tests for different data records; for 100MB I put  3,00,000 Records,
> for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
> It is observed that 'pg_subscriber' is better when the table size is
> more.
>

Thanks for the tests. IIUC, it shows for smaller data this tool can
take more time. Can we do perf to see if there is something we can do
about reducing the overhead?

> Next I plan to run these tests for 10GB and 20GB to see if this trend
> continues or not.
>

Okay, that makes sense.

With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Tue, Jan 16, 2024 at 11:58 AM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> > >
> > > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > > On 08.11.23 00:12, Michael Paquier wrote:
> > > >> - Should the subdirectory pg_basebackup be renamed into something more
> > > >> generic at this point?  All these things are frontend tools that deal
> > > >> in some way with the replication protocol to do their work.  Say
> > > >> a replication_tools?
> > > >
> > > > Seems like unnecessary churn.  Nobody has complained about any of the other
> > > > tools in there.
> > >
> > > Not sure.  We rename things across releases in the tree from time to
> > > time, and here that's straight-forward.
> > >
> > >
> > > Based on this discussion it seems we have a consensus that this tool should be
> > > in the pg_basebackup directory. (If/when we agree with the directory renaming,
> > > it could be done in a separate patch.) Besides this move, the v3 provides a dry
> > > run mode. It basically executes every routine but skip when should do
> > > modifications. It is an useful option to check if you will be able to run it
> > > without having issues with connectivity, permission, and existing objects
> > > (replication slots, publications, subscriptions). Tests were slightly improved.
> > > Messages were changed to *not* provide INFO messages by default and --verbose
> > > provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> > > also refactored the connect_database() function into which the connection will
> > > always use the logical replication mode. A bug was fixed in the transient
> > > replication slot name. Ashutosh review [1] was included. The code was also indented.
> > >
> > > There are a few suggestions from Ashutosh [2] that I will reply in another
> > > email.
> > >
> > > I'm still planning to work on the following points:
> > >
> > > 1. improve the cleanup routine to point out leftover objects if there is any
> > >    connection issue.
> > >
> >
> > I think this is an important part. Shall we try to write to some file
> > the pending objects to be cleaned up? We do something like that during
> > the upgrade.
> >
> > > 2. remove the physical replication slot if the standby is using one
> > >    (primary_slot_name).
> > > 3. provide instructions to promote the logical replica into primary, I mean,
> > >    stop the replication between the nodes and remove the replication setup
> > >    (publications, subscriptions, replication slots). Or even include another
> > >    action to do it. We could add both too.
> > >
> > > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > > UI for users that would like to use it.
> > >
> >
> > Isn't point 2 also essential because how would otherwise such a slot
> > be advanced or removed?
> >
> > A few other points:
> > ==============
> > 1. Previously, I asked whether we need an additional replication slot
> > patch created to get consistent LSN and I see the following comment in
> > the patch:
> >
> > + *
> > + * XXX we should probably use the last created replication slot to get a
> > + * consistent LSN but it should be changed after adding pg_basebackup
> > + * support.
> >
> > Yeah, sure, we may want to do that after backup support and we can
> > keep a comment for the same but I feel as the patch stands today,
> > there is no good reason to keep it. Also, is there a reason that we
> > can't create the slots after backup is complete and before we write
> > recovery parameters
> >
> > 2.
> > + appendPQExpBuffer(str,
> > +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> > +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> > +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
> >
> > Shouldn't we enable two_phase by default for newly created
> > subscriptions? Is there a reason for not doing so?
> >
> > 3. How about sync slots on the physical standby if present? Do we want
> > to retain those as it is or do we need to remove those? We are
> > actively working on the patch [1] for the same.
> >
> > 4. Can we see some numbers with various sizes of databases (cluster)
> > to see how it impacts the time for small to large-size databases as
> > compared to the traditional method? This might help us with giving
> > users advice on when to use this tool. We can do this bit later as
> > well when the patch is closer to being ready for commit.
>
> I have done the Performance testing and attached the results to
> compare the 'Execution Time' between 'logical replication' and
> 'pg_subscriber' for 100MB, 1GB and 5GB data:
>                             | 100MB | 1GB      | 5GB
> Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
> Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
> Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
> Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
> pg_subscriber     | 3.977s  | 9.988s   | 12.665s
>
> Here, 'w' stands for 'workers'. I have included the tests to see the
> test result variations with different values for
> 'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
> tests for different data records; for 100MB I put  3,00,000 Records,
> for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
> It is observed that 'pg_subscriber' is better when the table size is
> more.
> Next I plan to run these tests for 10GB and 20GB to see if this trend
> continues or not.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 10GB and 20GB data:
                            | 10GB     | 20GB
Logical rep (2 w) | 157.131s| 343.191s
Logical rep (4 w) | 116.627s| 240.480s
Logical rep (8 w) | 95.237s  | 275.715s
Logical rep(10 w)| 92.792s  | 280.538s
pg_subscriber     | 22.734s  | 25.661s

As expected, we can see that pg_subscriber is very much better in
ideal cases with approximately 7x times better in case of 10GB and 13x
times better in case of 20GB.
I'm attaching the script files which have the details of the test
scripts used and the excel file has the test run details. The
'pg_subscriber.pl' file is for 'Streaming Replication' and the
'logical_replication.pl' file is for 'Logical Replication'.
Note: For 20GB the record count should be changed to 6,00,00,000 and
the 'max_sync_workers_per_subscription' needs to be adjusted for
different logical replication tests with different workers.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 18.01.24 10:37, Amit Kapila wrote:
> The other option could be pg_createsubscriber on the lines of
> pg_verifybackup and pg_combinebackup.

Yes, that spelling would be more consistent.

> Yet other options could be
> pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> sounds like we are doing some work to create the subscriber which I
> think is the case here.

I see your point here.  pg_createsubscriber is not like createuser in 
that it just runs an SQL command.  It does something different than 
CREATE SUBSCRIBER.  So a different verb would make that clearer.  Maybe 
something from here: https://www.thesaurus.com/browse/convert




RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

>
> 15.
> I found that subscriptions cannot be started if tuples are inserted on publisher
> after creating temp_replslot. After starting a subscriber, I got below output on the
> log.
>
> ```
> ERROR:  could not receive data from WAL stream: ERROR:  publication
> "pg_subscriber_5" does not exist
> CONTEXT:  slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change
> callback, associated LSN 0/30008A8
> LOG:  background worker "logical replication apply worker" (PID 3669) exited
> with exit code 1
> ```
>
> But this is strange. I confirmed that the specified publication surely exists.
> Do you know the reason?
>
> ```
> publisher=# SELECT pubname FROM pg_publication;
>      pubname
> -----------------
>  pg_subscriber_5
> (1 row)
> ```
>

I analyzed and found a reason. This is because publications are invisible for some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given LSN (got at step2).
=====pg_subscriber finished here=====
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at step1.
11. on primary, decoding was started but ERROR was raised.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED




RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Peter,

> 
> > Yet other options could be
> > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > sounds like we are doing some work to create the subscriber which I
> > think is the case here.
> 
> I see your point here.  pg_createsubscriber is not like createuser in
> that it just runs an SQL command.  It does something different than
> CREATE SUBSCRIBER.  So a different verb would make that clearer.  Maybe
> something from here: https://www.thesaurus.com/browse/convert

I read the link and found a good verb "switch". So, how about using "pg_switchsubscriber"?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> >
> > > Yet other options could be
> > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > > sounds like we are doing some work to create the subscriber which I
> > > think is the case here.
> >
> > I see your point here.  pg_createsubscriber is not like createuser in
> > that it just runs an SQL command.  It does something different than
> > CREATE SUBSCRIBER.

Right.

  So a different verb would make that clearer.  Maybe
> > something from here: https://www.thesaurus.com/browse/convert
>
> I read the link and found a good verb "switch". So, how about using "pg_switchsubscriber"?
>

I also initially thought on these lines and came up with a name like
pg_convertsubscriber but didn't feel strongly about it as that would
have sounded meaningful if we use a name like
pg_convertstandbytosubscriber. Now, that has become too long. Having
said that, I am not opposed to it having a name on those lines. BTW,
another option that occurred to me today is pg_preparesubscriber. We
internally create slots and then wait for wal, etc. which makes me
sound like adding 'prepare' in the name can also explain the purpose.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Dear Euler, hackers,

We fixed some of the comments posted in the thread. We have created it
as top-up patch 0002 and 0003.

0002 patch contains the following changes:
* Add a timeout option for the recovery option, per [1]. The code was
basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
  (not sure it is really needed, so feel free reject the part.)
* Add check for max_replication_slots and wal_level; as per [4]
* Add -u and -p options; as per [5]
* Addressed comment except 5 and 8 in [6] and comment in [7]

0003 patch contains fix for bug reported in [8].

Feel free to merge parts of 0002 and 0003 if it looks good to you.
Thanks Kuroda-san to make patch 0003 and a part of patch 0002.

[1]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4]: https://www.postgresql.org/message-id/CALDaNm098Jkbh%2Bye6zMj9Ro9j1bBe6FfPV80BFbs1%3DpUuTJ07g%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CAA4eK1JB_ko7a5JMS3WfAn583RadAKCDhiE9JgmfMA8ZZ5xcQw%40mail.gmail.com
[6]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[7]: https://www.postgresql.org/message-id/CANhcyEXjGmryoZPACS_i-joqvcz5e6Zb3u4g38SAy_iSTGhShg%40mail.gmail.com
[8]:
https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Thanks and regards
Shlok Kyal

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

> We fixed some of the comments posted in the thread. We have created it
> as top-up patch 0002 and 0003.

I found that the CFbot raised an ERROR. Also, it may not work well in case
of production build.
PSA the fixed patch set.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Jan 22, 2024, at 6:22 AM, Amit Kapila wrote:
On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu)
> >
> > > Yet other options could be
> > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > > sounds like we are doing some work to create the subscriber which I
> > > think is the case here.
> >
> > I see your point here.  pg_createsubscriber is not like createuser in
> > that it just runs an SQL command.  It does something different than
> > CREATE SUBSCRIBER.

Right.

Subscriber has a different meaning of subscription. Subscription is an SQL
object. Subscriber is the server (node in replication terminology) where the
subscription resides. Having said that pg_createsubscriber doesn't seem a bad
name because you are creating a new subscriber. (Indeed, you are transforming /
converting but "create" seems closer and users can infer that it is a tool to
build a new logical replica.

  So a different verb would make that clearer.  Maybe
>
> I read the link and found a good verb "switch". So, how about using "pg_switchsubscriber"?
>

I also initially thought on these lines and came up with a name like
pg_convertsubscriber but didn't feel strongly about it as that would
have sounded meaningful if we use a name like
pg_convertstandbytosubscriber. Now, that has become too long. Having
said that, I am not opposed to it having a name on those lines. BTW,
another option that occurred to me today is pg_preparesubscriber. We
internally create slots and then wait for wal, etc. which makes me
sound like adding 'prepare' in the name can also explain the purpose.

I think "convert" and "transform" fit for this case. However, "create",
"convert" and "transform" have 6, 7 and 9 characters,  respectively. I suggest
that we avoid long names (subscriber already has 10 characters). My preference
is pg_createsubscriber.


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Jan 22, 2024, at 4:06 AM, Hayato Kuroda (Fujitsu) wrote:
I analyzed and found a reason. This is because publications are invisible for some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given LSN (got at step2).
=====pg_subscriber finished here=====
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at step1.
11. on primary, decoding was started but ERROR was raised.

Good catch! It is a design flaw.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.

I'll include a similar code in the next patch and also explain why we should
create the publication earlier. (I'm renaming
create_all_logical_replication_slots to setup_publisher and calling
create_publication from there and also adding the proposed GUC checks in it.)


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Jan 22, 2024, at 6:30 AM, Shlok Kyal wrote:
We fixed some of the comments posted in the thread. We have created it
as top-up patch 0002 and 0003.

Cool.

0002 patch contains the following changes:
* Add a timeout option for the recovery option, per [1]. The code was
basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
  (not sure it is really needed, so feel free reject the part.)
* Add check for max_replication_slots and wal_level; as per [4]
* Add -u and -p options; as per [5]
* Addressed comment except 5 and 8 in [6] and comment in [7]

My suggestion is that you create separate patches for each change. It helps
with review and alternative proposals. Some of these items conflict with what I
have in my local branch and removing one of them is time consuming. For this
one, I did the job but let's avoid rework.

0003 patch contains fix for bug reported in [8].

LGTM. As I said in the other email, I included it.

I'll post a new one soon.


--
Euler Taveira

Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Tue, Jan 23, 2024 at 7:41 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > We fixed some of the comments posted in the thread. We have created it
> > as top-up patch 0002 and 0003.
>
> I found that the CFbot raised an ERROR. Also, it may not work well in case
> of production build.
> PSA the fixed patch set.

 Segmentation fault was found after testing the given command(There is
an extra '/' between 'new_standby2' and '-P') '$ gdb --args
./pg_subscriber -D ../new_standby2 / -P "host=localhost
port=5432 dbname=postgres" -d postgres'
While executing the above command, I got the following error:
pg_subscriber: error: too many command-line arguments (first is "/")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555557e5b in cleanup_objects_atexit () at pg_subscriber.c:173
173            if (perdb->made_subscription)
(gdb) p perdb
$1 = (LogicalRepPerdbInfo *) 0x0

Thanks and Regards,
Shubham Khanna.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

>  Segmentation fault was found after testing the given command(There is
> an extra '/' between 'new_standby2' and '-P') '$ gdb --args
> ./pg_subscriber -D ../new_standby2 / -P "host=localhost
> port=5432 dbname=postgres" -d postgres'
> While executing the above command, I got the following error:
> pg_subscriber: error: too many command-line arguments (first is "/")
> pg_subscriber: hint: Try "pg_subscriber --help" for more information.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000555555557e5b in cleanup_objects_atexit () at pg_subscriber.c:173
> 173            if (perdb->made_subscription)
> (gdb) p perdb
> $1 = (LogicalRepPerdbInfo *) 0x0
>

Good catch, I could reproduce the issue. This crash was occurred because the
cleanup function was called before initialization memory.

There are several ways to fix it, but I chose to move the callback registration
behind. The function does actual tasks only after database objects are created.
So 0004 registers the function just before doing them. The memory allocation has
been done at that time. If required, Assert() can be added in the callback.

Can you test it and confirm the issue was solved?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

Based on the requirement, I have profiled the performance test. It showed bottlenecks
are in small-data case are mainly two - starting a server and waiting until the
recovery is done.

# Tested source code

V7 patch set was applied atop HEAD(0eb23285). No configure options were specified
when it was built.

# Tested workload

I focused on only 100MB/1GB cases because bigger ones have already had good performance.
(Number of inserted tuples were same as previous tests)
I used bash script instead of tap test framework. See attached. Executed SQLs and
operations were almost the same.

As you can see, I tested only one-db case. Results may be changed if the number
of databases were changed.

# Measurement
Some debug logs which output current time were added (please see diff file).
I picked up some events and done at before/after them. Below bullets showed the measured ones:

* Starting a server
* Stopping a server
* Creating replication slots
* Creating publications
* Waiting until the recovery ended
* Creating subscriptions

# Result 
Below table shows the elapsed time for these events. Raw data is also available
by the attached excel file.

|Event category                  |100MB case [sec]|1GB [sec]|
|Starting a server               |1.414           |1.417    |
|Stoping a server                |0.506           |0.506    |
|Creating replication slots      |0.005           |0.007    |
|Creating publications           |0.001           |0.002    |
|Waiting until the recovery ended|1.603           |14.529   |
|Creating subscriptions          |0.012           |0.012    |
|Total                           |3.541           |16.473   |
|actual time                     |4.37            |17.271   |


As you can see, starting servers and waiting seem slow. We cannot omit these,
but setting smaller shared_buffers will reduce the start time. One approach is
to overwrite the GUC to smaller value, but I think we cannot determine the
appropriate value.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

Here are comments for v8 patch set. I may revise them by myself,
but I want to post here to share all of them.

01.
```
/* Options */
static char *pub_conninfo_str = NULL;
static SimpleStringList database_names = {NULL, NULL};
static int    wait_seconds = DEFAULT_WAIT;
static bool retain = false;
static bool dry_run = false;
```

Just to confirm - is there a policy why we store the specified options? If you
want to store as global ones, username and port should follow (my fault...).
Or, should we have a structure to store them?

02.
```
{"subscriber-conninfo", required_argument, NULL, 'S'},
```

This is my fault, but "--subscriber-conninfo" is still remained. It should be
removed if it is not really needed.

03.
```
{"username", required_argument, NULL, 'u'},
```

Should we accept 'U' instead of 'u'?

04.
```
{"dry-run", no_argument, NULL, 'n'},
```

I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
which value would be changed based on the input. As for the pg_upgrade, it checks
whether the node can be upgraded for now. I think, we should have the checking
feature, so it should be renamed to --check. Also, the process should exit earlier
at that time.

05.
I felt we should accept some settings from enviroment variables, like pg_upgrade.
Currently, below items should be acceted.

- data directory
- username
- port
- timeout

06.
```
pg_logging_set_level(PG_LOG_WARNING);
```

If the default log level is warning, there are no ways to output debug logs.
(-v option only raises one, so INFO would be output)
I think it should be PG_LOG_INFO.

07.
Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()?

08.
Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
should be created. The name should contain the timestamp.

09.
Not sure, but should we check max_slot_wal_keep_size of primary server? It can
avoid to fail starting of logical replicaiton.

10.
```
    nslots_new = nslots_old + dbarr.ndbs;

    if (nslots_new > max_replication_slots)
    {
        pg_log_error("max_replication_slots (%d) must be greater than or equal to "
                     "the number of replication slots required (%d)", max_replication_slots, nslots_new);
        exit(1);
    }
```

I think standby server must not have replication slots. Because subsequent
pg_resetwal command discards all the WAL file, so WAL records pointed by them
are removed. Currently pg_resetwal does not raise ERROR at that time.

11.
```
    /*
     * Stop the subscriber if it is a standby server. Before executing the
     * transformation steps, make sure the subscriber is not running because
     * one of the steps is to modify some recovery parameters that require a
     * restart.
     */
    if (stat(pidfile, &statbuf) == 0)
```

I kept just in case, but I'm not sure it is still needed. How do you think?
Removing it can reduce an inclusion of pidfile.h.

12.
```
        pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
                              standby.bindir, standby.pgdata);
        rc = system(pg_ctl_cmd);
        pg_ctl_status(pg_ctl_cmd, rc, 0);
```


There are two places to stop the instance. Can you divide it into a function?

13.
```
     * A temporary replication slot is not used here to avoid keeping a
     * replication connection open (depending when base backup was taken, the
     * connection should be open for a few hours).
     */
    conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
    if (conn == NULL)
        exit(1);
    consistent_lsn = create_logical_replication_slot(conn, true,
                                                     &dbarr.perdb[0]);
```

I didn't notice the comment, but still not sure the reason. Why we must reserve
the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
is created only for getting a consistent_lsn. So we do not have to keep.
Also, just before, logical replication slots for each databases are created, so
WAL records are surely reserved.

14.

```
    pg_log_info("starting the subscriber");
    start_standby_server(&standby, subport, server_start_log);
```

This info should be in the function.

15.
```
    /*
     * Create a subscription for each database.
     */
    for (i = 0; i < dbarr.ndbs; i++)
```

This can be divided into a function, like create_all_subscriptions().

16.
My fault: usage() must be updated.

17. use_primary_slot_name
```
    if (PQntuples(res) != 1)
    {
        pg_log_error("could not obtain replication slot information: got %d rows, expected %d row",
                     PQntuples(res), 1);
        return NULL;
    }
```

Error message should be changed. I think this error means the standby has wrong primary_slot_name, right?

18. misc
Sometimes the pid of pg_subscriber is referred. It can be stored as global variable.

19.
C99-style has been allowed, so loop variables like "i" can be declared in the for-statement, like

```
for (int i = 0; i < MAX; i++)
```

20.
Some comments, docs, and outputs must be fixed when the name is changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 24.01.24 00:44, Euler Taveira wrote:
> Subscriber has a different meaning of subscription. Subscription is an SQL
> object. Subscriber is the server (node in replication terminology) where the
> subscription resides. Having said that pg_createsubscriber doesn't seem 
> a bad
> name because you are creating a new subscriber. (Indeed, you are 
> transforming /
> converting but "create" seems closer and users can infer that it is a 
> tool to
> build a new logical replica.

That makes sense.

(Also, the problem with "convert" etc. is that "convertsubscriber" would 
imply that you are converting an existing subscriber to something else. 
It would need to be something like "convertbackup" then, which doesn't 
seem helpful.)

> I think "convert" and "transform" fit for this case. However, "create",
> "convert" and "transform" have 6, 7 and 9 characters,  respectively. I 
> suggest
> that we avoid long names (subscriber already has 10 characters). My 
> preference
> is pg_createsubscriber.

That seems best to me.




RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Peter,

Thanks for giving your idea!

> > Subscriber has a different meaning of subscription. Subscription is an SQL
> > object. Subscriber is the server (node in replication terminology) where the
> > subscription resides. Having said that pg_createsubscriber doesn't seem
> > a bad
> > name because you are creating a new subscriber. (Indeed, you are
> > transforming /
> > converting but "create" seems closer and users can infer that it is a
> > tool to
> > build a new logical replica.
> 
> That makes sense.
> 
> (Also, the problem with "convert" etc. is that "convertsubscriber" would
> imply that you are converting an existing subscriber to something else.
> It would need to be something like "convertbackup" then, which doesn't
> seem helpful.)
> 
> > I think "convert" and "transform" fit for this case. However, "create",
> > "convert" and "transform" have 6, 7 and 9 characters,  respectively. I
> > suggest
> > that we avoid long names (subscriber already has 10 characters). My
> > preference
> > is pg_createsubscriber.
> 
> That seems best to me.

Just FYI - I'm ok to change the name to pg_createsubscriber.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote:
I'll post a new one soon.

I'm attaching another patch that fixes some of the issues pointed out by
Hayato, Shlok, and Junwang.

* publication doesn't exist. The analysis [1] was done by Hayato but I didn't
  use the proposed patch. Instead I refactored the code a bit [2] and call it
  from a new function (setup_publisher) that is called before the promotion.
* fix wrong path name in the initial comment [3]
* change terminology: logical replica -> physical replica [3]
* primary / standby is ready for logical replication? setup_publisher() and
  setup_subscriber() check if required GUCs are set accordingly. For primary,
  it checks wal_level = logical, max_replication_slots has remain replication
  slots for the proposed setup and also max_wal_senders available. For standby,
  it checks max_replication_slots for replication origin and also remain number
  of background workers to start the subscriber.
* retain option: I extracted this one from Hayato's patch [4]
* target server must be a standby. It seems we agree that this restriction
  simplifies the code a bit but can be relaxed in the future (if/when base
  backup support is added.)
* recovery timeout option: I decided to include it but I think the use case is
  too narrow. It helps in broken setups. However, it can be an issue in some
  scenarios like time-delayed replica, large replication lag, slow hardware,
  slow network. I didn't use the proposed patch [5]. Instead, I came up with a
  simple one that defaults to forever. The proposed one defaults to 60 seconds
  but I'm afraid that due to one of the scenarios I said in a previous
  sentence, we cancel a legitimate case. Maybe we should add a message during
  dry run saying that due to a replication lag, it will take longer to run.
* refactor primary_slot_name code. With the new setup_publisher and
  setup_subscriber functions, I splitted the function that detects the
  primary_slot_name use into 2 pieces just to avoid extra connections to have
  the job done.
* remove fallback_application_name as suggested by Hayato [5] because logical
  replication already includes one.

I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.

I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I
didn't hear objections about it, I'll do it in the next patch.




--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Jan 25, 2024, at 6:05 AM, Hayato Kuroda (Fujitsu) wrote:
01.
```
/* Options */
static char *pub_conninfo_str = NULL;
static SimpleStringList database_names = {NULL, NULL};
static int wait_seconds = DEFAULT_WAIT;
static bool retain = false;
static bool dry_run = false;
```

Just to confirm - is there a policy why we store the specified options? If you
want to store as global ones, username and port should follow (my fault...).
Or, should we have a structure to store them?

It is a matter of style I would say. Check other client applications. Some of
them also use global variable. There are others that group options into a
struct. I would say that since it has a short lifetime, I don't think the
current style is harmful.

04.
```
{"dry-run", no_argument, NULL, 'n'},
```

I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
which value would be changed based on the input. As for the pg_upgrade, it checks
whether the node can be upgraded for now. I think, we should have the checking
feature, so it should be renamed to --check. Also, the process should exit earlier
at that time.

It is extremely useful because (a) you have a physical replication setup and
don't know if it is prepared for logical replication, (b) check GUCs (is
max_wal_senders sufficient for this pg_subscriber command? Or is
max_replication_slots sufficient to setup the logical replication even though I
already have some used replication slots?), (c) connectivity and (d)
credentials.

05.
I felt we should accept some settings from enviroment variables, like pg_upgrade.
Currently, below items should be acceted.

- data directory
- username
- port
- timeout

Maybe PGDATA.

06.
```
pg_logging_set_level(PG_LOG_WARNING);
```

If the default log level is warning, there are no ways to output debug logs.
(-v option only raises one, so INFO would be output)
I think it should be PG_LOG_INFO.

You need to specify multiple -v options.

07.
Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()?

I think v9 does it.

08.
Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?

09.
Not sure, but should we check max_slot_wal_keep_size of primary server? It can
avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?
Hmm. We might add another check that can be noticed during dry run mode.

10.
```
nslots_new = nslots_old + dbarr.ndbs;

if (nslots_new > max_replication_slots)
{
pg_log_error("max_replication_slots (%d) must be greater than or equal to "
"the number of replication slots required (%d)", max_replication_slots, nslots_new);
exit(1);
}
```

I think standby server must not have replication slots. Because subsequent
pg_resetwal command discards all the WAL file, so WAL records pointed by them
are removed. Currently pg_resetwal does not raise ERROR at that time.

Again, dry run mode might provide a message for it.

11.
```
/*
* Stop the subscriber if it is a standby server. Before executing the
* transformation steps, make sure the subscriber is not running because
* one of the steps is to modify some recovery parameters that require a
* restart.
*/
if (stat(pidfile, &statbuf) == 0)
```

I kept just in case, but I'm not sure it is still needed. How do you think?
Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?

12.
```
pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
  standby.bindir, standby.pgdata);
rc = system(pg_ctl_cmd);
pg_ctl_status(pg_ctl_cmd, rc, 0);
```


There are two places to stop the instance. Can you divide it into a function?

Yes.

13.
```
* A temporary replication slot is not used here to avoid keeping a
* replication connection open (depending when base backup was taken, the
* connection should be open for a few hours).
*/
conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
if (conn == NULL)
exit(1);
consistent_lsn = create_logical_replication_slot(conn, true,
&dbarr.perdb[0]);
```

I didn't notice the comment, but still not sure the reason. Why we must reserve
the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
is created only for getting a consistent_lsn. So we do not have to keep.
Also, just before, logical replication slots for each databases are created, so
WAL records are surely reserved.

This comment needs to be updated. It was written at the time I was pursuing
base backup support too. It doesn't matter if you remove this transient
replication slot earlier because all of the replication slots created to the
subscriptions were created *before* the one for the consistent LSN. Hence, no
additional WAL retention due to this transient replication slot.

14.

```
pg_log_info("starting the subscriber");
start_standby_server(&standby, subport, server_start_log);
```

This info should be in the function.

Ok.

15.
```
/*
* Create a subscription for each database.
*/
for (i = 0; i < dbarr.ndbs; i++)
```

This can be divided into a function, like create_all_subscriptions().

Ok.

16.
My fault: usage() must be updated.

17. use_primary_slot_name
```
if (PQntuples(res) != 1)
{
pg_log_error("could not obtain replication slot information: got %d rows, expected %d row",
PQntuples(res), 1);
return NULL;
}
```

Error message should be changed. I think this error means the standby has wrong primary_slot_name, right?

I refactored this code a bit but the message is the same. It detects 2 cases:
(a) you set primary_slot_name but you don't have a replication slot with the
same name and (b) a cannot-happen bug that provides > 1 rows. It is a broken
setup so maybe a hint saying so is enough.

18. misc
Sometimes the pid of pg_subscriber is referred. It can be stored as global variable.

I prefer to keep getpid() call.

19.
C99-style has been allowed, so loop variables like "i" can be declared in the for-statement, like

```
for (int i = 0; i < MAX; i++)
```

v9 does it.

20.
Some comments, docs, and outputs must be fixed when the name is changed.

Next patch.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch! Before reading yours, I wanted to reply some of comments.

>
I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.
>

For me, required amount of fixes are not so different from current one. How about
others?

>
It is extremely useful because (a) you have a physical replication setup and
don't know if it is prepared for logical replication, (b) check GUCs (is
max_wal_senders sufficient for this pg_subscriber command? Or is
max_replication_slots sufficient to setup the logical replication even though I
already have some used replication slots?), (c) connectivity and (d)
credentials.
>

Yeah, it is useful for verification purpose, so let's keep this option.
But I still think the naming should be "--check". Also, there are many
`if (!dry_run)` but most of them can be removed if the process exits earlier.
Thought?


>
> 05.
> I felt we should accept some settings from enviroment variables, like pg_upgrade.
> Currently, below items should be acceted.
>
> - data directory
> - username
> - port
> - timeout

Maybe PGDATA.
>

Sorry, I cannot follow this. Did you mean that the target data directory should
be able to be specified by PGDATA? OF so, +1.

>
> 06.
> ```
> pg_logging_set_level(PG_LOG_WARNING);
> ```
>
> If the default log level is warning, there are no ways to output debug logs.
> (-v option only raises one, so INFO would be output)
> I think it should be PG_LOG_INFO.

You need to specify multiple -v options.
>

Hmm. I felt the specification was bit strange...but at least it must be
described on the documentation. pg_dump.sgml has similar lines.

>
> 08.
> Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
> should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?
>

This comment assumed outputs by pg_subscriber were also recorded to a file.
In this case and if the file also has the same timestamp, I think they can be
gathered in the same place. No need if outputs are not recorded.


>
> 09.
> Not sure, but should we check max_slot_wal_keep_size of primary server? It can
> avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?
Hmm. We might add another check that can be noticed during dry run mode.
>

I thought that we should not generate any broken objects, but indeed, not sure
it is our scope. How do other think?


>
> 11.
> ```
> /*
> * Stop the subscriber if it is a standby server. Before executing the
> * transformation steps, make sure the subscriber is not running because
> * one of the steps is to modify some recovery parameters that require a
> * restart.
> */
> if (stat(pidfile, &statbuf) == 0)
> ```
>
> I kept just in case, but I'm not sure it is still needed. How do you think?
> Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?
>

Running `pg_ctl stop` itself can detect whether the process has been still alive.
It would exit with 1 when the process is not there.

>
> I didn't notice the comment, but still not sure the reason. Why we must reserve
> the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
> is created only for getting a consistent_lsn. So we do not have to keep.
> Also, just before, logical replication slots for each databases are created, so
> WAL records are surely reserved.
>

I want to confirm the conclusion - will you remove the creation of a transient slot?

Also, not tested, I'm now considering that we can reuse the primary_conninfo value.
We are assuming that the target server is standby and the current upstream one will
convert to publisher. In this case, the connection string is already specified as
primary_conninfo so --publisher-conninfo may not be needed. The parameter does
not contain database name, so --databases is still needed. I imagine like:

1. Parse options
2. Turn on standby
3. Verify the standby
4. Turn off standby
5. Get primary_conninfo from standby
6. Connect to primary (concat of primary_conninfo and an option is used)
7. Verify the primary
...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/




RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Again, thanks for updating the patch! There are my random comments for v9.

01.
I cannot find your replies for my comments#7 [1] but you reverted related changes.
I'm not sure you are still considering it or you decided not to include changes.
Can you clarify your opinion?
(It is needed because changes are huge so it quite affects other developments...)

02.
```
+       <term><option>-t <replaceable class="parameter">seconds</replaceable></option></term>
+       <term><option>--timeout=<replaceable class="parameter">seconds</replaceable></option></term>
```

But source codes required `--recovery-timeout`. Please update either of them,

03.
```
+ *    Create a new logical replica from a standby server
```

Junwang pointed out to change here but the change was reverted [2]
Can you clarify your opinion as well?

04.
```
+/*
+ * Is the source server ready for logical replication? If so, create the
+ * publications and replication slots in preparation for logical replication.
+ */
+static bool
+setup_publisher(LogicalRepInfo *dbinfo)
```

But this function verifies the source server. I felt they should be in the
different function.

05.
```
+/*
+ * Is the target server ready for logical replication?
+ */
+static bool
+setup_subscriber(LogicalRepInfo *dbinfo)
````

Actually, this function does not set up subscriber. It just verifies whether the
target can become a subscriber, right? If should be renamed.

06.
```
+    atexit(cleanup_objects_atexit);
```

The registration of the cleanup function is too early. This sometimes triggers
a core-dump. E.g.,

```
$ pg_subscriber --publisher-conninfo --subscriber-conninfo 'user=postgres port=5432' --verbose --database 'postgres'
--pgdatadata_N2/ 
pg_subscriber: error: too many command-line arguments (first is "user=postgres port=5432")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.
Segmentation fault (core dumped)

$ gdb ...
(gdb) bt
#0  cleanup_objects_atexit () at pg_subscriber.c:131
#1  0x00007fb982cffce9 in __run_exit_handlers () from /lib64/libc.so.6
#2  0x00007fb982cffd37 in exit () from /lib64/libc.so.6
#3  0x00000000004054e6 in main (argc=9, argv=0x7ffc59074158) at pg_subscriber.c:1500
(gdb) f 3
#3  0x00000000004054e6 in main (argc=9, argv=0x7ffc59074158) at pg_subscriber.c:1500
1500                    exit(1);
(gdb) list
1495            if (optind < argc)
1496            {
1497                    pg_log_error("too many command-line arguments (first is \"%s\")",
1498                                             argv[optind]);
1499                    pg_log_error_hint("Try \"%s --help\" for more information.", progname);
1500                    exit(1);
1501            }
1502
1503            /*
1504             * Required arguments
```

I still think it should be done just before the creation of objects [3].

07.
Missing a removal of publications on the standby.

08.
Missing registration of LogicalRepInfo in the typedefs.list.

09
```
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_subscriber</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
```

Can you reply my comment#2 [4]? I think mandatory options should be written.

10.
Just to confirm - will you implement start_standby/stop_standby functions in next version?

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/CAEG8a3%2BwL_2R8n12BmRz7yBP3EBNdHDhmdgxQFA9vS%2BzPR%2B3Kw%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/




Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:
Again, thanks for updating the patch! There are my random comments for v9.

Thanks for checking v9. I already incorporated some of the points below into
the next patch. Give me a couple of hours to include some important points.

01.
I cannot find your replies for my comments#7 [1] but you reverted related changes.
I'm not sure you are still considering it or you decided not to include changes.
Can you clarify your opinion?
(It is needed because changes are huge so it quite affects other developments...)

It is still on my list. As I said in a previous email I'm having a hard time
reviewing pieces from your 0002 patch because you include a bunch of things
into one patch.

02.
```
+       <term><option>-t <replaceable class="parameter">seconds</replaceable></option></term>
+       <term><option>--timeout=<replaceable class="parameter">seconds</replaceable></option></term>
```

But source codes required `--recovery-timeout`. Please update either of them,

Oops. Fixed. My preference is --recovery-timeout because someone can decide to
include a --timeout option for this tool.

03.
```
+ *    Create a new logical replica from a standby server
```

Junwang pointed out to change here but the change was reverted [2]
Can you clarify your opinion as well?

I'll review the documentation once I fix the code. Since the tool includes the
*create* verb into its name, it seems strange use another verb (convert) in the
description. Maybe we should just remove the word *new* and a long description
explains the action is to turn the standby (physical replica) into a logical
replica.

04.
```
+/*
+ * Is the source server ready for logical replication? If so, create the
+ * publications and replication slots in preparation for logical replication.
+ */
+static bool
+setup_publisher(LogicalRepInfo *dbinfo)
```

But this function verifies the source server. I felt they should be in the
different function.

I split setup_publisher() into check_publisher() and setup_publisher().

05.
```
+/*
+ * Is the target server ready for logical replication?
+ */
+static bool
+setup_subscriber(LogicalRepInfo *dbinfo)
````

Actually, this function does not set up subscriber. It just verifies whether the
target can become a subscriber, right? If should be renamed.

I renamed setup_subscriber() -> check_subscriber().

06.
```
+    atexit(cleanup_objects_atexit);
```

The registration of the cleanup function is too early. This sometimes triggers
a core-dump. E.g., 

I forgot to apply the atexit() patch.

07.
Missing a removal of publications on the standby.

It was on my list to do. It will be in the next patch.

08.
Missing registration of LogicalRepInfo in the typedefs.list.

Done.

09
```
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_subscriber</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
```

Can you reply my comment#2 [4]? I think mandatory options should be written.

I included the mandatory options into the synopsis.

10.
Just to confirm - will you implement start_standby/stop_standby functions in next version?

It is still on my list.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

>
It is still on my list. As I said in a previous email I'm having a hard time
reviewing pieces from your 0002 patch because you include a bunch of things
...
It is still on my list.
>

I understood that patches we posted were bad. Sorry for inconvenience.
So I extracted changes only related with them. Can you review them and include
If it seems OK?

v10-0001: same as v9-0001.
0002: not related with our changes, but this fixed a small bug.
          The third argument of getopt_long was not correct.
0003: adds start_standby/stop_standby funcitons
0004: tries to refactor some structures.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/


Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Sun, Jan 28, 2024, at 10:10 PM, Euler Taveira wrote:
On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:
Again, thanks for updating the patch! There are my random comments for v9.

Thanks for checking v9. I already incorporated some of the points below into
the next patch. Give me a couple of hours to include some important points.

Here it is another patch that includes the following changes:

* rename the tool to pg_createsubscriber: it was the name with the most votes
  [1].
* fix recovery-timeout option [2]
* refactor: split setup_publisher into check_publisher (that contains only GUC
  checks) and setup_publisher (that does what the name suggests) [2]
* doc: verbose option can be specified multiple times [2]
* typedefs.list: add LogicalRepInfo [2]
* fix: register cleanup routine after the data structure (dbinfo) is assigned
  [2]
* doc: add mandatory options to synopsis [3]
* refactor: rename some options (such as publisher-conninfo to
  publisher-server) to use the same pattern as pg_rewind.
* feat: remove publications from subscriber [2]
* feat: use temporary replication slot to get consistent LSN [3]
* refactor: move subscriber setup to its own function
* refactor: stop standby server to its own function [2]
* refactor: start standby server to its own function [2]
* fix: getopt options [4]

There is a few open items in my list. I included v10-0002. I had already
included a refactor to include start/stop functions so I didn't include
v10-0003. I'll check v10-0004 tomorrow.

One open item that is worrying me is how to handle the pg_ctl timeout. This
patch does nothing and the user should use PGCTLTIMEOUT environment variable to
avoid that the execution is canceled after 60 seconds (default for pg_ctl).
Even if you set a high value, it might not be enough for cases like
time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
option. I'll include this caveat into the documentation but I'm afraid it is
not sufficient and we should provide a better way to handle this situation.



--
Euler Taveira

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch!

> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. Thought?

Below part contains my comments for v11-0001. Note that the ordering is random.

01. doc
```
    <group choice="req">
     <arg choice="plain"><option>-D</option> </arg>
     <arg choice="plain"><option>--pgdata</option></arg>
    </group>
```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

02. doc
```
  <para>
   <application>pg_createsubscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>

```

I found that you did not include my suggestion without saying [1]. Do you dislike
the comment or still considering?

03. doc
```
      <term><option>-P  <replaceable class="parameter">connstr</replaceable></option></term>
```

Too many blank after -P.

04. doc
```
    <para>
     <application>pg_createsubscriber</application> checks if the target data
     directory is used by a physical replica. Stop the physical replica if it is
     running. One of the next steps is to add some recovery parameters that
     requires a server start. This step avoids an error.
    </para>
```

I think doc is not updated. Currently (not sure it is good) the physical replica
must be running before doing pg_createsubscriber.

05. doc
```
     each specified database on the source server. The replication slot name
     contains a <literal>pg_createsubscriber</literal> prefix. These replication
     slots will be used by the subscriptions in a future step.  A temporary
```

According to the "Replication Slot Management" subsection in logical-replication.sgml,
the format of names should be described expressly, e.g.:

```
These replication slots have generated names:"pg_createsubscriber_%u_%d" (parameters: Subscription oid, Process id
pid).
```

Publications, a temporary slot, and subscriptions should be described as well.

06. doc
```
     Next, <application>pg_createsubscriber</application> creates one publication
     for each specified database on the source server. Each publication
     replicates changes for all tables in the database. The publication name

```

Missing update. Publications are created before creating replication slots.

07. general
I think there are some commenting conversions in PG, but this file breaks it.

* Comments must be one-line as much as possible
* Periods must not be added when the comment is one-line.
* Initial character must be capital.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().


09. LogicalRepInfo
```
    char       *subconninfo;    /* subscription connection string for logical
                                 * replication */
```

As I posted in comment#8[2], I don't think it is "subscription connection". Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
        /*
         * If --database option is not provided, try to obtain the dbname from
         * the publisher conninfo. If dbname parameter is not available, error
         * out.
         */

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

11. check_data_directory
```
/*
 * Is it a cluster directory? These are preliminary checks. It is far from
 * making an accurate check. If it is not a clone from the publisher, it will
 * eventually fail in a future step.
 */
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a directory.
GetDataDirectoryCreatePerm() verifies it.

12. main
```
    /* Register a function to clean up objects in case of failure. */
    atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
due to signals:

> Functions  registered  using atexit() (and on_exit(3)) are not called if a
> process terminates abnormally because of the delivery of a signal.

Do you have a good way to handle the case? One solution is to output created
objects in any log level, but the consideration may be too much. Thought?

13, main
```
    /*
     * Create a temporary logical replication slot to get a consistent LSN.
```

Just to clarify - I still think the process exits before here in case of dry run.
In case of pg_resetwal, the process exits before doing actual works like
RewriteControlFile().

14. main
```
     * XXX we might not fail here. Instead, we provide a warning so the user
     * eventually drops this replication slot later.
```

But there are possibilities to exit(1) in drop_replication_slot(). Is it acceptable?

15. wait_for_end_recovery
```
        /*
         * Bail out after recovery_timeout seconds if this option is set.
         */
        if (recovery_timeout > 0 && timer >= recovery_timeout)
```

Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your mind?

16. main
```
    /*
     * Create the subscription for each database on subscriber. It does not
     * enable it immediately because it needs to adjust the logical
     * replication start point to the LSN reported by consistent_lsn (see
     * set_replication_progress). It also cleans up publications created by
     * this tool and replication to the standby.
     */
    if (!setup_subscriber(dbinfo, consistent_lsn))
```

Subscriptions would be created and replication origin would be moved forward here,
but latter one can be done only by the superuser. I felt that this should be
checked in check_subscriber().

17. main
```
    /*
     * Change system identifier.
     */
    modify_sysid(pg_resetwal_path, subscriber_dir);
```

Even if I executed without -v option, an output from pg_resetwal command appears.
It seems bit strange.

```
$ pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -S "port=5432 user=postgres" -d postgres
Write-ahead log reset
$
```

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/




Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:

On Thu, Jan 18, 2024 at 6:19 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Very early in this thread, someone mentioned the name
> pg_create_subscriber, and of course there is pglogical_create_subscriber
> as the historical predecessor.  Something along those lines seems better
> to me.  Maybe there are other ideas.
>

I've mentioned it upthread because of this pet project [1] that is one of the motivations behind upstream this facility.


--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:

On Wed, Jan 31, 2024 at 9:52 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> I extracted some review comments which may require many efforts. I hope this makes them
> easy to review.
>
> 0001: not changed from yours.
> 0002: avoid to use replication connections. Source: comment #3[1]
> 0003: Remove -P option and use primary_conninfo instead. Source: [2]
> 0004: Exit earlier when dry_run is specified. Source: [3]
> 0005: Refactor data structures. Source: [4]
>
> [1]: https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [2]: https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [4]: https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>

Hey folks,

Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:

~/pgsql took 19s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434'
pg_createsubscriber: error: could not create subscription "pg_createsubscriber_16384_695617" on database "fabrizio": ERROR:  syntax error at or near "/"
LINE 1: ..._16384_695617 CONNECTION 'user=fabrizio passfile='/home/fabr...
                                                             ^
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_16384_695617" on database "fabrizio":
pg_createsubscriber: error: could not drop replication slot "pg_subscriber_695617_startpoint" on database "fabrizio": ERROR:  replication slot "pg_subscriber_695617_startpoint" does not exist

And the LOG contains the following:

~/pgsql took 12s
✦ ➜ cat /tmp/replica5434/pg_createsubscriber_output.d/server_start_20240131T110318.730.log
2024-01-31 11:03:19.138 -03 [695632] LOG:  starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-01-31 11:03:19.138 -03 [695632] LOG:  listening on IPv6 address "::1", port 5434
2024-01-31 11:03:19.138 -03 [695632] LOG:  listening on IPv4 address "127.0.0.1", port 5434
2024-01-31 11:03:19.158 -03 [695632] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5434"
2024-01-31 11:03:19.179 -03 [695645] LOG:  database system was shut down in recovery at 2024-01-31 11:03:18 -03
2024-01-31 11:03:19.180 -03 [695645] LOG:  entering standby mode
2024-01-31 11:03:19.192 -03 [695645] LOG:  redo starts at 0/4000028
2024-01-31 11:03:19.198 -03 [695645] LOG:  consistent recovery state reached at 0/504DB08
2024-01-31 11:03:19.198 -03 [695645] LOG:  invalid record length at 0/504DB08: expected at least 24, got 0
2024-01-31 11:03:19.198 -03 [695632] LOG:  database system is ready to accept read-only connections
2024-01-31 11:03:19.215 -03 [695646] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
2024-01-31 11:03:29.587 -03 [695645] LOG:  recovery stopping after WAL location (LSN) "0/504F260"
2024-01-31 11:03:29.587 -03 [695645] LOG:  redo done at 0/504F260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 10.39 s
2024-01-31 11:03:29.587 -03 [695645] LOG:  last completed transaction was at log time 2024-01-31 11:03:18.761544-03
2024-01-31 11:03:29.587 -03 [695646] FATAL:  terminating walreceiver process due to administrator command
2024-01-31 11:03:29.598 -03 [695645] LOG:  selected new timeline ID: 2
2024-01-31 11:03:29.680 -03 [695645] LOG:  archive recovery complete
2024-01-31 11:03:29.690 -03 [695643] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-01-31 11:03:29.795 -03 [695643] LOG:  checkpoint complete: wrote 51 buffers (0.3%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.021 s, sync=0.034 s, total=0.115 s; sync files=17, longest=0.011 s, average=0.002 s; distance=16700 kB, estimate=16700 kB; lsn=0/504F298, redo lsn=0/504F298
2024-01-31 11:03:29.805 -03 [695632] LOG:  database system is ready to accept connections
2024-01-31 11:03:30.332 -03 [695658] ERROR:  syntax error at or near "/" at character 90
2024-01-31 11:03:30.332 -03 [695658] STATEMENT:  CREATE SUBSCRIPTION pg_createsubscriber_16384_695617 CONNECTION 'user=fabrizio passfile='/home/fabrizio/.pgpass' channel_binding=prefer host=localhost port=5432 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=fabrizio' PUBLICATION pg_createsubscriber_16384 WITH (create_slot = false, copy_data = false, enabled = false)

Seems we need to escape connection params similar we do in dblink [1]

Regards,


--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:

Try v11. It seems v12-0002 is not correct.

Seems we need to escape connection params similar we do in dblink [1]

I think it is a consequence of v12-0003. I didn't review v12 yet but although I
have added a comment saying it might be possible to use primary_conninfo, I'm
not 100% convinced that's the right direction.

        /*
         * TODO use primary_conninfo (if available) from subscriber and
         * extract publisher connection string. Assume that there are
         * identical entries for physical and logical replication. If there is
         * not, we would fail anyway.
         */


--
Euler Taveira

Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:


On Wed, Jan 31, 2024 at 11:35 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
>
> Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:
>
>
> Try v11. It seems v12-0002 is not correct.

Using v11 I'm getting this error:

~/pgsql took 22s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434' -P 'host=/tmp port=5432'
NOTICE:  changed the failover state of replication slot "pg_createsubscriber_16384_706609" on publisher to false
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_706609_startpoint" on database "fabrizio": ERROR:  replication slot "pg_createsubscriber_706609_startpoint" does not exist
Write-ahead log reset

Attached the output log.

Regards,

--
Fabrízio de Royes Mello
Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Jan 31, 2024, at 11:55 AM, Fabrízio de Royes Mello wrote:

On Wed, Jan 31, 2024 at 11:35 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
>
> Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:
>
>
> Try v11. It seems v12-0002 is not correct.

Using v11 I'm getting this error:

~/pgsql took 22s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434' -P 'host=/tmp port=5432'
NOTICE:  changed the failover state of replication slot "pg_createsubscriber_16384_706609" on publisher to false
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_706609_startpoint" on database "fabrizio": ERROR:  replication slot "pg_createsubscriber_706609_startpoint" does not exist
Write-ahead log reset

Hmm. I didn't try it with the failover patch that was recently applied. Did you
have any special configuration on primary?


--
Euler Taveira

Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:

On Wed, Jan 31, 2024 at 12:38 PM Euler Taveira <euler@eulerto.com> wrote:
>
>
> Hmm. I didn't try it with the failover patch that was recently applied. Did you
> have any special configuration on primary?
>

Nothing special, here the configurations I've changed after bootstrap:

port = '5432'
wal_level = 'logical'
max_wal_senders = '8'
max_replication_slots = '6'
hot_standby_feedback = 'on'
max_prepared_transactions = '10'
max_locks_per_transaction = '512'

Regards,

--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Jan 30, 2024, at 6:26 AM, Hayato Kuroda (Fujitsu) wrote:
> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. Thought?

I didn't provide the whole explanation. I'm envisioning the use case that pg_ctl
doesn't reach the consistent state and the timeout is reached (the consequence
is that pg_createsubscriber aborts the execution). It might occur on a busy
server. The probability that it occurs with the current code is low (LSN gap
for recovery is small). Maybe I'm anticipating issues when the base backup
support is added but better to raise concerns during development.

Below part contains my comments for v11-0001. Note that the ordering is random.

Hayato, thanks for reviewing v11.

01. doc
```
    <group choice="req">
     <arg choice="plain"><option>-D</option> </arg>
     <arg choice="plain"><option>--pgdata</option></arg>
    </group>
```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

pg_upgrade doesn't but others do like pg_rewind, pg_resetwal, pg_controldata,
pg_checksums. It seems newer tools tend to provide short and long options.

02. doc
```
  <para>
   <application>pg_createsubscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>

```

I found that you did not include my suggestion without saying [1]. Do you dislike
the comment or still considering?

Documentation is on my list. I didn't fix the documentation since some design
decisions were changed. I'm still working on it.

03. doc
```
      <term><option>-P  <replaceable class="parameter">connstr</replaceable></option></term>
```

Too many blank after -P.

Fixed.

[documentation related items will be addressed later...]


07. general
I think there are some commenting conversions in PG, but this file breaks it.

It is on my list.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().

Done. I kept a few pg_log_error() + exit() because there is no
pg_fatal_and_hint() function.


09. LogicalRepInfo
```
char    *subconninfo; /* subscription connection string for logical
* replication */
```

As I posted in comment#8[2], I don't think it is "subscription connection". Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

Done.

s/publication/publisher/
s/subscription/subscriber/

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
/*
* If --database option is not provided, try to obtain the dbname from
* the publisher conninfo. If dbname parameter is not available, error
* out.
*/

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

Why not? Unique database is a common setup. It is unintended if you don't
document it accordingly. I'll make sure it is advertised in the --database and
the --publisher-server options.

11. check_data_directory
```
/*
* Is it a cluster directory? These are preliminary checks. It is far from
* making an accurate check. If it is not a clone from the publisher, it will
* eventually fail in a future step.
*/
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a directory.
GetDataDirectoryCreatePerm() verifies it.

Good point. It is included in the next patch.

12. main
```
/* Register a function to clean up objects in case of failure. */
atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
due to signals:

> Functions  registered  using atexit() (and on_exit(3)) are not called if a
> process terminates abnormally because of the delivery of a signal.

Do you have a good way to handle the case? One solution is to output created
objects in any log level, but the consideration may be too much. Thought?

Nothing? If you interrupt the execution, there will be objects left behind and
you, as someone that decided to do it, have to clean things up. What do you
expect this tool to do? The documentation will provide some guidance informing
the object name patterns this tool uses and you can check for leftover objects.
Someone can argue that is a valid feature request but IMO it is not one in the
top of the list.

13, main
```
/*
* Create a temporary logical replication slot to get a consistent LSN.
```

Just to clarify - I still think the process exits before here in case of dry run.
In case of pg_resetwal, the process exits before doing actual works like
RewriteControlFile().

Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.

14. main
```
* XXX we might not fail here. Instead, we provide a warning so the user
* eventually drops this replication slot later.
```

But there are possibilities to exit(1) in drop_replication_slot(). Is it acceptable?

No, there isn't.

15. wait_for_end_recovery
```
/*
* Bail out after recovery_timeout seconds if this option is set.
*/
if (recovery_timeout > 0 && timer >= recovery_timeout)
```

Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your mind?

Why? See [1]. I prefer the kind mode (always wait until the recovery ends) but
you and Amit are proposing a more aggressive mode. The proposal (-t 60) seems
ok right now, however, if the goal is to provide base backup support in the
future, you certainly should have to add the --recovery-timeout in big clusters
or those with high workloads because base backup is run between replication slot
creation and consistent LSN. Of course, we can change the default when base
backup support is added.

16. main
```
/*
* Create the subscription for each database on subscriber. It does not
* enable it immediately because it needs to adjust the logical
* replication start point to the LSN reported by consistent_lsn (see
* set_replication_progress). It also cleans up publications created by
* this tool and replication to the standby.
*/
if (!setup_subscriber(dbinfo, consistent_lsn))
```

Subscriptions would be created and replication origin would be moved forward here,
but latter one can be done only by the superuser. I felt that this should be
checked in check_subscriber().

Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.

17. main
```
/*
* Change system identifier.
*/
modify_sysid(pg_resetwal_path, subscriber_dir);
```

Even if I executed without -v option, an output from pg_resetwal command appears.
It seems bit strange.

The pg_resetwal is using a printf and there is no prefix that identifies that
message is from pg_resetwal. That's message has been bothering me for a while
so let's send it to /dev/null. I'll include it in the next patch.

    RewriteControlFile();
    KillExistingXLOG();
    KillExistingArchiveStatus();
    KillExistingWALSummaries();
    WriteEmptyXLOG();

    printf(_("Write-ahead log reset\n"));
    return 0;




--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Fabrízio,

Thanks for reporting. I understood that the issue occurred on v11 and v12.
I will try to reproduce and check the reason.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for giving comments! I want to reply some of them.

>
I didn't provide the whole explanation. I'm envisioning the use case that pg_ctl
doesn't reach the consistent state and the timeout is reached (the consequence
is that pg_createsubscriber aborts the execution). It might occur on a busy
server. The probability that it occurs with the current code is low (LSN gap
for recovery is small). Maybe I'm anticipating issues when the base backup
support is added but better to raise concerns during development.
>

Hmm, actually I didn't know the case. Thanks for explanation. I want to see
how you describe on the doc.

>
pg_upgrade doesn't but others do like pg_rewind, pg_resetwal, pg_controldata,
pg_checksums. It seems newer tools tend to provide short and long options.
>

Oh, you are right.

>
Nothing? If you interrupt the execution, there will be objects left behind and
you, as someone that decided to do it, have to clean things up. What do you
expect this tool to do? The documentation will provide some guidance informing
the object name patterns this tool uses and you can check for leftover objects.
Someone can argue that is a valid feature request but IMO it is not one in the
top of the list.
>

OK, so let's keep current style.

>
Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.
>

Yes, it was my expectation. I'm still not sure which operations can detect by the
dry_run, but we can keep it for now.

>
Why? See [1]. I prefer the kind mode (always wait until the recovery ends) but
you and Amit are proposing a more aggressive mode. The proposal (-t 60) seems
ok right now, however, if the goal is to provide base backup support in the
future, you certainly should have to add the --recovery-timeout in big clusters
or those with high workloads because base backup is run between replication slot
creation and consistent LSN. Of course, we can change the default when base
backup support is added.
>

Sorry, I was missing your previous post. Let's keep yours.

>
Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.
>

Not sure, but can we do the replication origin functions by these privilege?
According to the doc[1], these ones seem not to be related.

[1]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/




Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Jan 31, 2024, at 11:09 PM, Hayato Kuroda (Fujitsu) wrote:
>
Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.
>

Yes, it was my expectation. I'm still not sure which operations can detect by the
dry_run, but we can keep it for now.

The main goal is to have information for troubleshooting.


Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.
>

Not sure, but can we do the replication origin functions by these privilege?
According to the doc[1], these ones seem not to be related.

Hmm. No. :( Better add this check too.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Fabrízio, Euler,

I think you set the primary_slot_name to the standby server, right?
While reading codes, I found below line in v11-0001.
```
    if (primary_slot_name != NULL)
    {
        conn = connect_database(dbinfo[0].pubconninfo);
        if (conn != NULL)
        {
            drop_replication_slot(conn, &dbinfo[0], temp_replslot);
        }
```

Now the temp_replslot is temporary one, so it would be removed automatically.
This function will cause the error: replication slot "pg_createsubscriber_%u_startpoint" does not exist.
Also, the physical slot is still remained on the primary. 
In short, "temp_replslot" should be "primary_slot_name".

PSA a script file for reproducing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Fabrízio, Euler,

I made fix patches to solve reported issues by Fabrízio.

* v13-0001: Same as v11-0001 made by Euler.
* v13-0002: Fixes ERRORs while dropping replication slots [1].
           If you want to see codes which we get agreement, please apply until 0002.
=== experimental patches ===
* v13-0003: Avoids to use replication connections. The issue [2] was solved on my env.
* v13-0004: Removes -P option and use primary_conninfo instead.
* v13-0005: Refactors data structures

[1]: https://www.postgresql.org/message-id/CAFcNs%2BrSG9DcEewsoA%3D85DXhSRh%2BnyKrrcr64FEDytcZf6QaEQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAFcNs%2BpPtw%2By7Be00BK0MBpHhLk2s66tLM286g%3Dk5rew8kUxjg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Feb 1, 2024, at 9:47 AM, Hayato Kuroda (Fujitsu) wrote:
I made fix patches to solve reported issues by Fabrízio.

* v13-0001: Same as v11-0001 made by Euler.
* v13-0002: Fixes ERRORs while dropping replication slots [1].
           If you want to see codes which we get agreement, please apply until 0002.
=== experimental patches ===
* v13-0003: Avoids to use replication connections. The issue [2] was solved on my env.
* v13-0004: Removes -P option and use primary_conninfo instead.
* v13-0005: Refactors data structures

Thanks for rebasing the proposed patches. I'm attaching a new patch.

As I said in the previous email [1] I fixed some issues from your previous review:

* use pg_fatal() if possible. There are some cases that it couldn't replace
  pg_log_error() + exit(1) because it requires a hint.
* pg_resetwal output. Send standard output to /dev/null to avoid extra message.
* check privileges. Make sure the current role can execute CREATE SUBSCRIPTION
  and pg_replication_origin_advance().
* log directory. Refactor code that setup the log file used as server log.
* run with restricted token (Windows).
* v13-0002. Merged.
* v13-0003. I applied a modified version. I returned only the required
  information for each query.
* group command-line options into a new struct CreateSubscriberOptions. The
  exception is the dry_run option.
* rename functions that obtain system identifier.

WIP

I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.

The documentation update is almost there. I will include the modifications in
the next patch.

Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.



--
Euler Taveira

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch!

>
I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.
>

Right, your refactoring looks fewer stack. So I pause to revise my refactoring
patch.

>
The documentation update is almost there. I will include the modifications in
the next patch.
>

OK. I think it should be modified before native speakers will attend to the
thread.

>
Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.
>

I didn't have such a point of view. However, it is not related whether -P exists
or not. Even v14-0001 requires primary to accept both normal/replication connections.
If we want to avoid it, the connection from pg_createsubscriber can be restored
to replication-connection.
(I felt we do not have to use replication protocol even if we change the connection mode)

The motivation why -P is not needed is to ensure the consistency of nodes.
pg_createsubscriber assumes that the -P option can connect to the upstream node,
but no one checks it. Parsing two connection strings may be a solution but be
confusing. E.g., what if some options are different?
I think using a same parameter is a simplest solution.

And below part contains my comments for v14.

01.
```
char        temp_replslot[NAMEDATALEN] = {0};
```

I found that no one refers the name of temporary slot. Can we remove the variable?

02.
```
    CreateSubscriberOptions opt;
...
    memset(&opt, 0, sizeof(CreateSubscriberOptions));

    /* Default settings */
    opt.subscriber_dir = NULL;
    opt.pub_conninfo_str = NULL;
    opt.sub_conninfo_str = NULL;
    opt.database_names = (SimpleStringList)
    {
        NULL, NULL
    };
    opt.retain = false;
    opt.recovery_timeout = 0;
```

Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
All values are set to 0x0.

03.
```
/*
 * Is the standby server ready for logical replication?
 */
static bool
check_subscriber(LogicalRepInfo *dbinfo)
```

You said "target server must be a standby" in [1], but I cannot find checks for it.
IIUC, there are two approaches:
 a) check the existence "standby.signal" in the data directory
 b) call an SQL function "pg_is_in_recovery"

04.
```
static char *pg_ctl_path = NULL;
static char *pg_resetwal_path = NULL;
```

I still think they can be combined as "bindir".

05.

```
    /*
     * Write recovery parameters.
...
        WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
```

WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
sure it is good. These settings would remain on new subscriber even after the
pg_createsubscriber. Can we avoid it? I come up with passing these parameters
via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
(all GUCs must be allign like "-c XXX -c XXX -c XXX...").

06.
```
static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char
*sub_base_conninfo);
...
static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
...
static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
```

Functions arguments should not be struct because they are passing by value.
They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
we can pass a value which would be really used.

07.
```
static char *get_base_conninfo(char *conninfo, char *dbname,
                               const char *noderole);
```

Not sure noderole should be passed here. It is used only for the logging.
Can we output string before calling the function?
(The parameter is not needed anymore if -P is removed)

08.
The terminology is still not consistent. Some functions call the target as standby,
but others call it as subscriber.

09.
v14 does not work if the standby server has already been set recovery_target*
options. PSA the reproducer. I considered two approaches:

 a) raise an ERROR when these parameter were set. check_subscriber() can do it
 b) overwrite these GUCs as empty strings. 

10.
The execution always fails if users execute --dry-run just before. Because
pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
use-case, so current implementation seems not user-friendly. How should we fix?
Below bullets are my idea:

 a) avoid stopping the standby in case of dry_run: seems possible.
 b) accept even if the standby is stopped: seems possible.
 c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?

My top-up patches fixes some issues.

v15-0001: same as v14-0001
=== experimental patches ===
v15-0002: Use replication connections when we connects to the primary.
          Connections to standby is not changed because the standby/subscriber
          does not require such type of connection, in principle.
          If we can accept connecting to subscriber with replication mode,
          this can be simplified.
v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
v15-0005: Avoid stopping/starting standby server in dry_run mode.
          I.e., approach a). in #10 is used.
v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.

[1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Fri, Feb 2, 2024 at 3:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> Thanks for updating the patch!
>
> >
> I'm still working on the data structures to group options. I don't like the way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
>
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.
>
> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
>
> OK. I think it should be modified before native speakers will attend to the
> thread.
>
> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, primary_conninfo
> is used.
> >
>
> I didn't have such a point of view. However, it is not related whether -P exists
> or not. Even v14-0001 requires primary to accept both normal/replication connections.
> If we want to avoid it, the connection from pg_createsubscriber can be restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the connection mode)
>
> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.
>
> And below part contains my comments for v14.
>
> 01.
> ```
> char            temp_replslot[NAMEDATALEN] = {0};
> ```
>
> I found that no one refers the name of temporary slot. Can we remove the variable?
>
> 02.
> ```
>         CreateSubscriberOptions opt;
> ...
>         memset(&opt, 0, sizeof(CreateSubscriberOptions));
>
>         /* Default settings */
>         opt.subscriber_dir = NULL;
>         opt.pub_conninfo_str = NULL;
>         opt.sub_conninfo_str = NULL;
>         opt.database_names = (SimpleStringList)
>         {
>                 NULL, NULL
>         };
>         opt.retain = false;
>         opt.recovery_timeout = 0;
> ```
>
> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.
>
> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
>
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"
>
> 04.
> ```
> static char *pg_ctl_path = NULL;
> static char *pg_resetwal_path = NULL;
> ```
>
> I still think they can be combined as "bindir".
>
> 05.
>
> ```
>         /*
>          * Write recovery parameters.
> ...
>                 WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
> ```
>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").
>
> 06.
> ```
> static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char
*sub_base_conninfo);
> ...
> static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
> ...
> static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
> ```
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
> we can pass a value which would be really used.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>                                                            const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
> 08.
> The terminology is still not consistent. Some functions call the target as standby,
> but others call it as subscriber.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
>  a) raise an ERROR when these parameter were set. check_subscriber() can do it
>  b) overwrite these GUCs as empty strings.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
> use-case, so current implementation seems not user-friendly. How should we fix?
> Below bullets are my idea:
>
>  a) avoid stopping the standby in case of dry_run: seems possible.
>  b) accept even if the standby is stopped: seems possible.
>  c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?
>
> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>           Connections to standby is not changed because the standby/subscriber
>           does not require such type of connection, in principle.
>           If we can accept connecting to subscriber with replication mode,
>           this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>           I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com


While reviewing the v15 patches I discovered that subscription
connection string has added a lot of options which are not required
now:
v15-0001
postgres=# select subname, subconninfo from pg_subscription;
            subname            |               subconninfo
-------------------------------+------------------------------------------
pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
(1 row)



v15-0001+0002+0003
postgres=# select subname, subconninfo from pg_subscription;
            subname            |

      subconninfo



-------------------------------+------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
------------------------------
pg_createsubscriber_5_1895366 | user=shubham
passfile='/home/shubham/.pgpass' channel_binding=prefer ho
st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_versi
on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_
hosts=disable dbname=postgres
(1 row)


Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
gssencmode, krbsrvname, etc are getting included. This does not look
intentional, we should keep the subscription connection same as in
v15-0001.

Thanks and Regards,
Shubham Khanna.


Thanks and Regards,
Shubham Khanna.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
> 
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"

I found that current code (HEAD+v14-0001) leads stuck or timeout because the recovery
would be never finished. In attached script emulates the case standby.signal file is missing.
This script always fails with below output:

```
...
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster was stopped
pg_createsubscriber: error: recovery timed out
...
```

I also attached the server log during pg_createsubscriber, and we can see that
walreceiver exits before receiving all the required changes. I cannot find a path
yet, but the inconsistency lead the situation which walreceiver exists but startup
remains. This also said that recovery status must be checked in check_subscriber().

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for testing our codes!

> While reviewing the v15 patches I discovered that subscription
> connection string has added a lot of options which are not required
> now:
> v15-0001
> postgres=# select subname, subconninfo from pg_subscription;
>             subname            |               subconninfo
> -------------------------------+------------------------------------------
> pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
> (1 row)
> 
> 
> 
> v15-0001+0002+0003
> postgres=# select subname, subconninfo from pg_subscription;
>             subname            |
> 
>       subconninfo
> 
> 
> 
> -------------------------------+--------------------------------------------------
> ----------------------
> ----------------------------------------------------------------------------------
> ----------------------
> ----------------------------------------------------------------------------------
> ----------------------
> ------------------------------
> pg_createsubscriber_5_1895366 | user=shubham
> passfile='/home/shubham/.pgpass' channel_binding=prefer ho
> st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
> sslcertmode=allow sslsni=1 ssl_min_protocol_versi
> on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
> target_session_attrs=any load_balance_
> hosts=disable dbname=postgres
> (1 row)
> 
> 
> Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
> gssencmode, krbsrvname, etc are getting included. This does not look
> intentional, we should keep the subscription connection same as in
> v15-0001.

You should attach the script the reproducer. I suspected you used pg_basebackup
-R command for setting up the standby. In this case, it is intentional.
These settings are not caused by the pg_createsubscriber, done by pg_basebackup.
If you set primary_conninfo manually like attached, these settings would not appear.

As the first place, listed options (E.g., passfile, channel_binding, sslmode,
sslcompression, sslcertmode, etc...) were set when you connect to the database
via libpq functions. PQconninfoOptions in fe-connect.c lists parameters and
their default value.

v15-0003 reuses the primary_conninfo for subconninfo attribute. primary_conninfo
is set by pg_basebackup specified with '-R' option.
The content is built in GenerateRecoveryConfig(), which bypass parameters from
PQconninfo(). This function returns all the libpq connection parameters even if
it is set as default. So primary_conninfo looks longer.

I don't think this works wrongly. Users still can set an arbitrary connection
string as primary_conninfo. You just use longer string unintentionally.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>           Connections to standby is not changed because the standby/subscriber
>           does not require such type of connection, in principle.
>           If we can accept connecting to subscriber with replication mode,
>           this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>           I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com
>

I have created a topup patch 0007 on top of v15-0006.

I revived the patch which removes -S option and adds some options
instead. The patch add option for --port, --username and --socketdir.
This patch also ensures that anyone cannot connect to the standby
during the pg_createsubscriber, by setting listen_addresses,
unix_socket_permissions, and unix_socket_directories.

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
Thanks for updating the patch!

Thanks for taking a look.

>
I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.
>

Right, your refactoring looks fewer stack. So I pause to revise my refactoring
patch.

I didn't complete this task yet so I didn't include it in this patch.

>
The documentation update is almost there. I will include the modifications in
the next patch.
>

OK. I think it should be modified before native speakers will attend to the
thread.

Same for this one.

>
Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.
>

I didn't have such a point of view. However, it is not related whether -P exists
or not. Even v14-0001 requires primary to accept both normal/replication connections.
If we want to avoid it, the connection from pg_createsubscriber can be restored
to replication-connection.
(I felt we do not have to use replication protocol even if we change the connection mode)

That's correct. We made a decision to use non physical replication connections
(besides the one used to connect primary <-> standby). Hence, my concern about
a HBA rule falls apart. I'm not convinced that using a modified
primary_conninfo is the only/right answer to fill the subscription connection
string. Physical vs logical replication has different requirements when we talk
about users. The physical replication requires only the REPLICATION privilege.
On the other hand, to create a subscription you must have the privileges of
pg_create_subscription role and also CREATE privilege on the specified
database. Unless, you are always recommending the superuser for this tool, I'm
afraid there will be cases that reusing primary_conninfo will be an issue. The
more I think about this UI, the more I think that, if it is not hundreds of
lines of code, it uses the primary_conninfo the -P is not specified.

The motivation why -P is not needed is to ensure the consistency of nodes.
pg_createsubscriber assumes that the -P option can connect to the upstream node,
but no one checks it. Parsing two connection strings may be a solution but be
confusing. E.g., what if some options are different?
I think using a same parameter is a simplest solution.

Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
to primary.

I found that no one refers the name of temporary slot. Can we remove the variable?

It is gone. I did a refactor in the create_logical_replication_slot function.
Slot name is assigned internally (no need for slot_name or temp_replslot) and
temporary parameter is included.

Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
All values are set to 0x0.

It is. However, I keep the assignments for 2 reasons: (a) there might be
parameters whose default value is not zero, (b) the standard does not say that
a null pointer must be represented by zero and (c) there is no harm in being
paranoid during initial assignment.

You said "target server must be a standby" in [1], but I cannot find checks for it.
IIUC, there are two approaches:
a) check the existence "standby.signal" in the data directory
b) call an SQL function "pg_is_in_recovery"

I applied v16-0004 that implements option (b).

I still think they can be combined as "bindir".

I applied a patch that has a single variable for BINDIR.

WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
sure it is good. These settings would remain on new subscriber even after the
pg_createsubscriber. Can we avoid it? I come up with passing these parameters
via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
(all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.

Functions arguments should not be struct because they are passing by value.
They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
we can pass a value which would be really used.

Done.

07.
```
static char *get_base_conninfo(char *conninfo, char *dbname,
   const char *noderole);
```

Not sure noderole should be passed here. It is used only for the logging.
Can we output string before calling the function?
(The parameter is not needed anymore if -P is removed)

Done.

08.
The terminology is still not consistent. Some functions call the target as standby,
but others call it as subscriber.

The terminology should reflect the actual server role. I'm calling it "standby"
if it is a physical replica and "subscriber" if it is a logical replica. Maybe
"standby" isn't clear enough.

09.
v14 does not work if the standby server has already been set recovery_target*
options. PSA the reproducer. I considered two approaches:

a) raise an ERROR when these parameter were set. check_subscriber() can do it
b) overwrite these GUCs as empty strings. 

I prefer (b) that's exactly what you provided in v16-0006. 

10.
The execution always fails if users execute --dry-run just before. Because
pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
use-case, so current implementation seems not user-friendly. How should we fix?
Below bullets are my idea:

a) avoid stopping the standby in case of dry_run: seems possible.
b) accept even if the standby is stopped: seems possible.
c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?

I prefer (a). I applied a slightly modified version of v16-0005.

This new patch contains the following changes:

* check whether the target is really a standby server (0004)
* refactor: pg_create_logical_replication_slot function
* use a single variable for pg_ctl and pg_resetwal directory
* avoid recovery errors applying default settings for some GUCs (0006)
* don't stop/start the standby in dry run mode (0005)
* miscellaneous fixes

I don't understand why v16-0002 is required. In a previous version, this patch
was using connections in logical replication mode. After some discussion we
decided to change it to regular connections and use SQL functions (instead of
replication commands). Is it a requirement for v16-0003?

I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
However, I'm still worried about preventing some use cases if it provides only
the local connection option. What if you want to keep monitoring this instance
while the transformation is happening? Let's say it has a backlog that will
take some time to apply. Unless, you have a local agent, you have no data about
this server until pg_createsubscriber terminates. Even the local agent might
not connect to the server unless you use the current port.


--
Euler Taveira

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok,

Thanks for updating the patch!

> I have created a topup patch 0007 on top of v15-0006.
> 
> I revived the patch which removes -S option and adds some options
> instead. The patch add option for --port, --username and --socketdir.
> This patch also ensures that anyone cannot connect to the standby
> during the pg_createsubscriber, by setting listen_addresses,
> unix_socket_permissions, and unix_socket_directories.

IIUC, there are two reasons why removing -S may be good:

* force users to specify a local-connection, and
* avoid connection establishment on standby during the pg_createsubscriber.

First bullet is still valid, but we should describe that like pg_upgrade: 

>
pg_upgrade will connect to the old and new servers several times, so you might
want to set authentication to peer in pg_hba.conf or use a ~/.pgpass file
(see Section 33.16).
>

Regarding the second bullet, this patch cannot ensure it. pg_createsubscriber
just accepts port number which has been already accepted by the standby, it does
not change the port number. So any local applications on the standby server can
connect to the server and may change primary_conninfo, primary_slot_name, data, etc.
So...what should be? How do other think?

Beside, 0007 does not follow the recent changes on 0001. E.g., options should be
record in CreateSubscriberOptions. Also, should we check the privilege of socket
directory?

[1]:
https://www.postgresql.org/message-id/TY3PR01MB988902B992A4F2E99E1385EDF56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Sorry for posting e-mail each other. I will read your patch
but I want to reply one of yours.

>
I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
However, I'm still worried about preventing some use cases if it provides only
the local connection option. What if you want to keep monitoring this instance
while the transformation is happening? Let's say it has a backlog that will
take some time to apply. Unless, you have a local agent, you have no data about
this server until pg_createsubscriber terminates. Even the local agent might
not connect to the server unless you use the current port.
>

(IIUC, 0007 could not overwrite a port number - refactoring is needed)

Ah, actually I did not have such a point of view. Assuming that changed port number
can avoid connection establishments, there are four options:
a) Does not overwrite port and listen_addresses. This allows us to monitor by
   external agents, but someone can modify GUCs and data during operations.
b) Overwrite port but do not do listen_addresses. Not sure it is useful... 
c) Overwrite listen_addresses but do not do port. This allows us to monitor by
   local agents, and we can partially protect the database. But there is still a 
   room.
d) Overwrite both port and listen_addresses. This can protect databases perfectly
but no one can monitor.

Hmm, which one should be chosen? I prefer c) or d).
Do you know how pglogical_create_subscriber does?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

I have not finished reviewing, but I can reply to you first.

>
I didn't complete this task yet so I didn't include it in this patch.
>

Just to confirm - No need to forcibly include my refactoring patch: you can
modify based on your idea.

>
That's correct. We made a decision to use non physical replication connections
(besides the one used to connect primary <-> standby). Hence, my concern about
a HBA rule falls apart. I'm not convinced that using a modified
primary_conninfo is the only/right answer to fill the subscription connection
string. Physical vs logical replication has different requirements when we talk
about users. The physical replication requires only the REPLICATION privilege.
On the other hand, to create a subscription you must have the privileges of
pg_create_subscription role and also CREATE privilege on the specified
database. Unless, you are always recommending the superuser for this tool, I'm
afraid there will be cases that reusing primary_conninfo will be an issue. The
more I think about this UI, the more I think that, if it is not hundreds of
lines of code, it uses the primary_conninfo the -P is not specified.
>

Valid point. It is one of the best practice that users set minimal privileges
for each accounts. However...

>
Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
to primary.
>

I'm not sure it is correct. I think there are several patterns.

a) -P option specified a ghost server, i.e., pg_createsubscriber cannot connect to
   anything. In this case, get_primary_sysid() would be failed because
   connect_database() would be failed.

b) -P option specified an existing server, but it is not my upstream.
   get_primary_sysid() would be suceeded.
   In this case, there are two furher branches:
   
   b-1) nodes have different system_identifier. pg_createsubscriber would raise
        an ERROR and stop.
   b-2) nodes have the same system_identifier (e.g., nodes were generated by the
        same master). In this case, current checkings would be passed but
        pg_createsubscriber would stuck or reach timeout. PSA the reproducer.

Can pg_createsubscriber check the relation two nodes have been connected by
replication protocol? Or, can we assume that all the node surely have different
system_identifier?

>
It is. However, I keep the assignments for 2 reasons: (a) there might be
parameters whose default value is not zero, (b) the standard does not say that
a null pointer must be represented by zero and (c) there is no harm in being
paranoid during initial assignment.
>

Hmn, so, no need to use `= {0};`, but up to you. Also, according to C99 standard[1],
NULL seemed to be defined as 0x0:
```
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.
If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
```

>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.
>

Sorry for confusing, it is not a solution. This approach writes parameter
settings to postgresql.auto.conf, and they are never removed. Overwritten
settings would remain.

>
I don't understand why v16-0002 is required. In a previous version, this patch
was using connections in logical replication mode. After some discussion we
decided to change it to regular connections and use SQL functions (instead of
replication commands). Is it a requirement for v16-0003?
>

No, it is not required by 0003. I forgot the decision that we force DBAs to open
normal connection establishments for pg_createsubscriber. Please ignore...

[1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf


Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Feb 7, 2024, at 2:31 AM, Hayato Kuroda (Fujitsu) wrote:
Ah, actually I did not have such a point of view. Assuming that changed port number
can avoid connection establishments, there are four options:
a) Does not overwrite port and listen_addresses. This allows us to monitor by
   external agents, but someone can modify GUCs and data during operations.
b) Overwrite port but do not do listen_addresses. Not sure it is useful... 
c) Overwrite listen_addresses but do not do port. This allows us to monitor by
   local agents, and we can partially protect the database. But there is still a 
   room.
d) Overwrite both port and listen_addresses. This can protect databases perfectly
but no one can monitor.

Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application. Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].

postgres=# select name, setting, context from pg_settings where name in ('max_replication_slots', 'max_logical_replication_workers', 'max_worker_processes', 'max_sync_workers_per_subscription', 'max_parallel_apply_workers_per_subscription');
                    name                     | setting |  context   
---------------------------------------------+---------+------------
max_logical_replication_workers             | 4       | postmaster
max_parallel_apply_workers_per_subscription | 2       | sighup
max_replication_slots                       | 10      | postmaster
max_sync_workers_per_subscription           | 2       | sighup
max_worker_processes                        | 8       | postmaster
(5 rows)

I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?

Under reflection, I don't expect a big window

1802     /*
1803      * Start subscriber and wait until accepting connections.
1804      */
1805     pg_log_info("starting the subscriber");
1806     if (!dry_run)
1807         start_standby_server(pg_bin_dir, opt.subscriber_dir, server_start_log);
1808 
1809     /*
1810      * Waiting the subscriber to be promoted.
1811      */
1812     wait_for_end_recovery(dbinfo[0].subconninfo, pg_bin_dir, &opt);
.
.
.
1845     /*
1846      * Stop the subscriber.
1847      */
1848     pg_log_info("stopping the subscriber");
1849     if (!dry_run)
1850         stop_standby_server(pg_bin_dir, opt.subscriber_dir);

... mainly because the majority of the time will be wasted in
wait_for_end_recovery() if the server takes some time to reach consistent state
(and during this phase it cannot accept connections anyway). Aren't we worrying
too much about it?

Hmm, which one should be chosen? I prefer c) or d).
Do you know how pglogical_create_subscriber does?

pglogical_create_subscriber does nothing [2][3].




--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

>
Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application.
>

What if the client modifies the data just after the promotion?
Naively considered, all the changes can be accepted, but are there any issues?

>
Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].
>

IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

>
I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?
>

My remained concern is written above. If they do not problematic we may not have
to restrict them for now. At that time, changes 

1) overwriting a port number,
2) setting listen_addresses = ''

are not needed, right? IIUC inconsistency of -P may be still problematic.

>
pglogical_create_subscriber does nothing [2][3].
>

Oh, thanks.
Just to confirm - pglogical set shared_preload_libraries to '', should we follow or not?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Here are my minor comments for 17.

01.
```
/* Options */
static const char *progname;

static char *primary_slot_name = NULL;
static bool dry_run = false;

static bool success = false;

static LogicalRepInfo *dbinfo;
static int    num_dbs = 0;
```

The comment seems out-of-date. There is only one option.

02. check_subscriber and check_publisher

Missing pg_catalog prefix in some lines.

03. get_base_conninfo

I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.


04.

I check the coverage and found two functions have been never called:
 - drop_subscription
 - drop_replication_slot

Also, some cases were not tested. Below bullet showed notable ones for me.
(Some of them would not be needed based on discussions)

* -r is specified
* -t is specified
* -P option contains dbname
* -d is not specified
* GUC settings are wrong
* primary_slot_name is specified on the standby
* standby server is not working

In feature level, we may able to check the server log is surely removed in case
of success.

So, which tests should be added? drop_subscription() is called only when the
cleanup phase, so it may be difficult to test. According to others, it seems that
-r and -t are not tested. GUC-settings have many test cases so not sure they
should be. Based on this, others can be tested.

PSA my top-up patch set.

V18-0001: same as your patch, v17-0001.
V18-0002: modify the alignment of codes.
V18-0003: change an argument of get_base_conninfo. Per comment 3.
=== experimental patches ===
V18-0004: Add testcases per comment 4.
V18-0005: Remove -P option. I'm not sure it should be needed, but I made just in case.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Wed, 7 Feb 2024 at 10:24, Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!

Thanks for the updated patch, few comments:
Few comments:
1) Cleanup function handler flag should be reset, i.e.
dbinfo->made_replslot = false; should be there else there will be an
error during  drop replication slot cleanup in error flow:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);
+
+       pg_log_info("dropping the replication slot \"%s\" on database
\"%s\"", slot_name, dbinfo->dbname);
+
+       appendPQExpBuffer(str, "SELECT
pg_drop_replication_slot('%s')", slot_name);
+
+       pg_log_debug("command is: %s", str->data);

2) Cleanup function handler flag should be reset, i.e.
dbinfo->made_publication = false; should be there else there will be
an error during drop publication cleanup in error flow:
+/*
+ * Remove publication if it couldn't finish all steps.
+ */
+static void
+drop_publication(PGconn *conn, LogicalRepInfo *dbinfo)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);
+
+       pg_log_info("dropping publication \"%s\" on database \"%s\"",
dbinfo->pubname, dbinfo->dbname);
+
+       appendPQExpBuffer(str, "DROP PUBLICATION %s", dbinfo->pubname);
+
+       pg_log_debug("command is: %s", str->data);

3) Cleanup function handler flag should be reset, i.e.
dbinfo->made_subscription = false; should be there else there will be
an error during drop publication cleanup in error flow:
+/*
+ * Remove subscription if it couldn't finish all steps.
+ */
+static void
+drop_subscription(PGconn *conn, LogicalRepInfo *dbinfo)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);
+
+       pg_log_info("dropping subscription \"%s\" on database \"%s\"",
dbinfo->subname, dbinfo->dbname);
+
+       appendPQExpBuffer(str, "DROP SUBSCRIPTION %s", dbinfo->subname);
+
+       pg_log_debug("command is: %s", str->data);

4) I was not sure if drop_publication is required here, as we will not
create any publication in subscriber node:
+               if (dbinfo[i].made_subscription)
+               {
+                       conn = connect_database(dbinfo[i].subconninfo);
+                       if (conn != NULL)
+                       {
+                               drop_subscription(conn, &dbinfo[i]);
+                               if (dbinfo[i].made_publication)
+                                       drop_publication(conn, &dbinfo[i]);
+                               disconnect_database(conn);
+                       }
+               }

5) The connection should be disconnected in case of error case:
+       /* secure search_path */
+       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not clear search_path: %s",
PQresultErrorMessage(res));
+               return NULL;
+       }
+       PQclear(res);

6) There should be a line break before postgres_fe inclusion, to keep
it consistent:
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <signal.h>

7) These includes are not required:
7.a) #include <signal.h>
7.b) #include <sys/stat.h>
7.c) #include <sys/wait.h>
7.d) #include "access/xlogdefs.h"
7.e) #include "catalog/pg_control.h"
7.f) #include "common/file_utils.h"
7.g) #include "utils/pidfile.h"

+ *             src/bin/pg_basebackup/pg_createsubscriber.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <signal.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/wait.h>
+#include <time.h>
+
+#include "access/xlogdefs.h"
+#include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
+#include "common/connect.h"
+#include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
+#include "common/logging.h"
+#include "common/restricted_token.h"
+#include "fe_utils/recovery_gen.h"
+#include "fe_utils/simple_list.h"
+#include "getopt_long.h"
+#include "utils/pidfile.h"

8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
required or kept for future purpose:
+enum WaitPMResult
+{
+       POSTMASTER_READY,
+       POSTMASTER_STANDBY,
+       POSTMASTER_STILL_STARTING,
+       POSTMASTER_FAILED
+};

9) pg_createsubscriber should be kept after pg_basebackup to maintain
the consistent order:
diff --git a/src/bin/pg_basebackup/.gitignore b/src/bin/pg_basebackup/.gitignore
index 26048bdbd8..b3a6f5a2fe 100644
--- a/src/bin/pg_basebackup/.gitignore
+++ b/src/bin/pg_basebackup/.gitignore
@@ -1,5 +1,6 @@
 /pg_basebackup
 /pg_receivewal
 /pg_recvlogical
+/pg_createsubscriber

10) dry-run help message is not very clear, how about something
similar to pg_upgrade's message like "check clusters only, don't
change any data":
+       printf(_(" -d, --database=DBNAME               database to
create a subscription\n"));
+       printf(_(" -n, --dry-run                       stop before
modifying anything\n"));
+       printf(_(" -t, --recovery-timeout=SECS         seconds to wait
for recovery to end\n"));
+       printf(_(" -r, --retain                        retain log file
after success\n"));
+       printf(_(" -v, --verbose                       output verbose
messages\n"));
+       printf(_(" -V, --version                       output version
information, then exit\n"));

Regards,
Vignesh



Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Wed, Feb 7, 2024 at 10:24 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!
>
>
> Thanks for taking a look.
>
> >
> I'm still working on the data structures to group options. I don't like the way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
>
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.
>
>
> I didn't complete this task yet so I didn't include it in this patch.
>
> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
>
> OK. I think it should be modified before native speakers will attend to the
> thread.
>
>
> Same for this one.
>
> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, primary_conninfo
> is used.
> >
>
> I didn't have such a point of view. However, it is not related whether -P exists
> or not. Even v14-0001 requires primary to accept both normal/replication connections.
> If we want to avoid it, the connection from pg_createsubscriber can be restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the connection mode)
>
>
> That's correct. We made a decision to use non physical replication connections
> (besides the one used to connect primary <-> standby). Hence, my concern about
> a HBA rule falls apart. I'm not convinced that using a modified
> primary_conninfo is the only/right answer to fill the subscription connection
> string. Physical vs logical replication has different requirements when we talk
> about users. The physical replication requires only the REPLICATION privilege.
> On the other hand, to create a subscription you must have the privileges of
> pg_create_subscription role and also CREATE privilege on the specified
> database. Unless, you are always recommending the superuser for this tool, I'm
> afraid there will be cases that reusing primary_conninfo will be an issue. The
> more I think about this UI, the more I think that, if it is not hundreds of
> lines of code, it uses the primary_conninfo the -P is not specified.
>
> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.
>
>
> Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
> to primary.
>
> I found that no one refers the name of temporary slot. Can we remove the variable?
>
>
> It is gone. I did a refactor in the create_logical_replication_slot function.
> Slot name is assigned internally (no need for slot_name or temp_replslot) and
> temporary parameter is included.
>
> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.
>
>
> It is. However, I keep the assignments for 2 reasons: (a) there might be
> parameters whose default value is not zero, (b) the standard does not say that
> a null pointer must be represented by zero and (c) there is no harm in being
> paranoid during initial assignment.
>
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
> a) check the existence "standby.signal" in the data directory
> b) call an SQL function "pg_is_in_recovery"
>
>
> I applied v16-0004 that implements option (b).
>
> I still think they can be combined as "bindir".
>
>
> I applied a patch that has a single variable for BINDIR.
>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").
>
>
> I applied a modified version of v16-0006.
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
> we can pass a value which would be really used.
>
>
> Done.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>    const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
>
> Done.
>
> 08.
> The terminology is still not consistent. Some functions call the target as standby,
> but others call it as subscriber.
>
>
> The terminology should reflect the actual server role. I'm calling it "standby"
> if it is a physical replica and "subscriber" if it is a logical replica. Maybe
> "standby" isn't clear enough.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
> a) raise an ERROR when these parameter were set. check_subscriber() can do it
> b) overwrite these GUCs as empty strings.
>
>
> I prefer (b) that's exactly what you provided in v16-0006.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
> use-case, so current implementation seems not user-friendly. How should we fix?
> Below bullets are my idea:
>
> a) avoid stopping the standby in case of dry_run: seems possible.
> b) accept even if the standby is stopped: seems possible.
> c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?
>
>
> I prefer (a). I applied a slightly modified version of v16-0005.
>
> This new patch contains the following changes:
>
> * check whether the target is really a standby server (0004)
> * refactor: pg_create_logical_replication_slot function
> * use a single variable for pg_ctl and pg_resetwal directory
> * avoid recovery errors applying default settings for some GUCs (0006)
> * don't stop/start the standby in dry run mode (0005)
> * miscellaneous fixes
>
> I don't understand why v16-0002 is required. In a previous version, this patch
> was using connections in logical replication mode. After some discussion we
> decided to change it to regular connections and use SQL functions (instead of
> replication commands). Is it a requirement for v16-0003?
>
> I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
> However, I'm still worried about preventing some use cases if it provides only
> the local connection option. What if you want to keep monitoring this instance
> while the transformation is happening? Let's say it has a backlog that will
> take some time to apply. Unless, you have a local agent, you have no data about
> this server until pg_createsubscriber terminates. Even the local agent might
> not connect to the server unless you use the current port.

I tried verifying few scenarios by using 5 databases and came across
the following errors:

./pg_createsubscriber -D ../new_standby -P "host=localhost port=5432
dbname=postgres" -S "host=localhost port=9000 dbname=postgres"  -d db1
-d db2 -d db3 -d db4 -d db5

pg_createsubscriber: error: publisher requires 6 wal sender
processes, but only 5 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 7.

It is successful only with 7 wal senders, so we can change error
messages accordingly.


pg_createsubscriber: error: publisher requires 6 replication slots,
but only 5 remain
pg_createsubscriber: hint: Consider increasing max_replication_slots
to at least 7.

It is successful only with 7 replication slots, so we can change error
messages accordingly.

Thanks and Regards,
Shubham Khanna,



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Further comments for v17.

01.
This program assumes that the target server has same major version with this.
Because the target server would be restarted by same version's pg_ctl command.
I felt it should be ensured by reading the PG_VERSION.

02.
pg_upgrade checked the version of using executables, like pg_ctl, postgres, and
pg_resetwal. I felt it should be as well.

03. get_bin_directory
```
    if (find_my_exec(path, full_path) < 0)
    {
        pg_log_error("The program \"%s\" is needed by %s but was not found in the\n"
                     "same directory as \"%s\".\n",
                     "pg_ctl", progname, full_path);
```

s/"pg_ctl"/progname

04.
Missing canonicalize_path()?

05.
Assuming that the target server is a cascade standby, i.e., it has a role as
another primary. In this case, I thought the child node would not work. Because
pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at that
time. I have not tested, but should the program detect it and exit earlier?

06.
wait_for_end_recovery() waits forever even if the standby has been disconnected
from the primary, right? should we check the status of the replication via
pg_stat_wal_receiver?

07.
The cleanup function has couple of bugs.

* If subscriptions have been created on the database, the function also tries to
  drop a publication. But it leads an ERROR because it has been already dropped.
  See setup_subscriber().
* If the subscription has been created, drop_replication_slot() leads an ERROR.
  Because the subscriber tried to drop the subscription while executing DROP SUBSCRIPTION.

08.
I found that all messages (ERROR, WARNING, INFO, etc...) would output to stderr,
but I felt it should be on stdout. Is there a reason? pg_dump outputs messages to
stderr, but the motivation might be to avoid confusion with dumps.

09.
I'm not sure the cleanup for subscriber is really needed. Assuming that there
are two databases, e.g., pg1 pg2 , and we fail to create a subscription on pg2.
This can happen when the subscription which has the same name has been already
created on the primary server.
In this case a subscirption pn pg1 would be removed. But what is a next step?
Since a timelineID on the standby server is larger than the primary (note that
the standby has been promoted once), we cannot resume the physical replication
as-is. IIUC the easiest method to retry is removing a cluster once and restarting
from pg_basebackup. If so, no need to cleanup the standby because it is corrupted.
We just say "Please remove the cluster and recreate again".

Here is a reproducer.

1. apply the txt patch atop 0001 patch.
2. run test_corruption.sh.
3. when you find a below output [1], connect to a testdb from another terminal and
   run CREATE SUBSCRITPION for the same subscription on the primary
4. Finally, pg_createsubscriber would fail the creation.

I also attached server logs of both nodes and the output.
Note again that this is a real issue. I used a tricky way for surely overlapping name,
but this can happen randomly.

10.
While investigating #09, I found that we cannot report properly a reason why the
subscription cannot be created. The output said:

```
pg_createsubscriber: error: could not create subscription "pg_createsubscriber_16389_3884" on database "testdb": out of
memory
```

But the standby serverlog said:

```
ERROR:  subscription "pg_createsubscriber_16389_3884" already exists
STATEMENT:  CREATE SUBSCRIPTION pg_createsubscriber_16389_3884 CONNECTION 'user=postgres port=5431 dbname=testdb'
PUBLICATIONpg_createsubscriber_16389 WITH (create_slot = false, copy_data = false, enabled = false)
 
```

[1]
```
pg_createsubscriber: creating the replication slot "pg_createsubscriber_16389_3884" on database "testdb"
pg_createsubscriber: XXX: sleep 20s
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

Since the original author seems bit busy, I updated the patch set.

> 
> 01.
> ```
> /* Options */
> static const char *progname;
> 
> static char *primary_slot_name = NULL;
> static bool dry_run = false;
> 
> static bool success = false;
> 
> static LogicalRepInfo *dbinfo;
> static int    num_dbs = 0;
> ```
> 
> The comment seems out-of-date. There is only one option.

Changed the comment to /* Global variables */.

> 
> 02. check_subscriber and check_publisher
> 
> Missing pg_catalog prefix in some lines.

This has been already addressed in v18.

> 03. get_base_conninfo
> 
> I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.

This has been already addressed in v18.

> 04.
> 
> I check the coverage and found two functions have been never called:
>  - drop_subscription
>  - drop_replication_slot
> 
> Also, some cases were not tested. Below bullet showed notable ones for me.
> (Some of them would not be needed based on discussions)
> 
> * -r is specified
> * -t is specified
> * -P option contains dbname
> * -d is not specified
> * GUC settings are wrong
> * primary_slot_name is specified on the standby
> * standby server is not working
> 
> In feature level, we may able to check the server log is surely removed in case
> of success.
> 
> So, which tests should be added? drop_subscription() is called only when the
> cleanup phase, so it may be difficult to test. According to others, it seems that
> -r and -t are not tested. GUC-settings have many test cases so not sure they
> should be. Based on this, others can be tested.

This has been already addressed in v18.

PSA my top-up patch set.

V19-0001: same as Euler's patch, v17-0001.
V19-0002: Update docs per recent changes. Also, some adjustments were done.
V19-0003: Modify the alignment of codes. Mostly same as v18-0002.
V19-0004: Change an argument of get_base_conninfo. Same as v18-0003.
=== experimental patches ===
V19-0005: Add testcases. Same as v18-0004.
V19-0006: Update a comment above global variables.
V19-0007: Address comments from Vignesh.
V19-0008: Fix error message in get_bin_directory().
V19-0009: Remove -P option. Same as v18-0005. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

Since the original author seems bit busy, I updated the patch set.

> 1) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_replslot = false; should be there else there will be an
> error during  drop replication slot cleanup in error flow:
> +static void
> +drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
> char *slot_name)
> +{
> +       PQExpBuffer str = createPQExpBuffer();
> +       PGresult   *res;
> +
> +       Assert(conn != NULL);
> +
> +       pg_log_info("dropping the replication slot \"%s\" on database
> \"%s\"", slot_name, dbinfo->dbname);
> +
> +       appendPQExpBuffer(str, "SELECT
> pg_drop_replication_slot('%s')", slot_name);
> +
> +       pg_log_debug("command is: %s", str->data);

Fixed.

> 2) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_publication = false; should be there else there will be
> an error during drop publication cleanup in error flow:
> +/*
> + * Remove publication if it couldn't finish all steps.
> + */
> +static void
> +drop_publication(PGconn *conn, LogicalRepInfo *dbinfo)
> +{
> +       PQExpBuffer str = createPQExpBuffer();
> +       PGresult   *res;
> +
> +       Assert(conn != NULL);
> +
> +       pg_log_info("dropping publication \"%s\" on database \"%s\"",
> dbinfo->pubname, dbinfo->dbname);
> +
> +       appendPQExpBuffer(str, "DROP PUBLICATION %s", dbinfo->pubname);
> +
> +       pg_log_debug("command is: %s", str->data);
> 
> 3) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_subscription = false; should be there else there will be
> an error during drop publication cleanup in error flow:
> +/*
> + * Remove subscription if it couldn't finish all steps.
> + */
> +static void
> +drop_subscription(PGconn *conn, LogicalRepInfo *dbinfo)
> +{
> +       PQExpBuffer str = createPQExpBuffer();
> +       PGresult   *res;
> +
> +       Assert(conn != NULL);
> +
> +       pg_log_info("dropping subscription \"%s\" on database \"%s\"",
> dbinfo->subname, dbinfo->dbname);
> +
> +       appendPQExpBuffer(str, "DROP SUBSCRIPTION %s",
> dbinfo->subname);
> +
> +       pg_log_debug("command is: %s", str->data);

Fixed.

> 4) I was not sure if drop_publication is required here, as we will not
> create any publication in subscriber node:
> +               if (dbinfo[i].made_subscription)
> +               {
> +                       conn = connect_database(dbinfo[i].subconninfo);
> +                       if (conn != NULL)
> +                       {
> +                               drop_subscription(conn, &dbinfo[i]);
> +                               if (dbinfo[i].made_publication)
> +                                       drop_publication(conn, &dbinfo[i]);
> +                               disconnect_database(conn);
> +                       }
> +               }

Removed. But I'm not sure the cleanup is really meaningful.
See [1].

> 5) The connection should be disconnected in case of error case:
> +       /* secure search_path */
> +       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               pg_log_error("could not clear search_path: %s",
> PQresultErrorMessage(res));
> +               return NULL;
> +       }
> +       PQclear(res);

PQfisnih() was added.

> 6) There should be a line break before postgres_fe inclusion, to keep
> it consistent:
> + *-------------------------------------------------------------------------
> + */
> +#include "postgres_fe.h"
> +
> +#include <signal.h>

Added.

> 7) These includes are not required:
> 7.a) #include <signal.h>
> 7.b) #include <sys/stat.h>
> 7.c) #include <sys/wait.h>
> 7.d) #include "access/xlogdefs.h"
> 7.e) #include "catalog/pg_control.h"
> 7.f) #include "common/file_utils.h"
> 7.g) #include "utils/pidfile.h"

Removed.

> + *             src/bin/pg_basebackup/pg_createsubscriber.c
> + *
> + *-------------------------------------------------------------------------
> + */
> +#include "postgres_fe.h"
> +
> +#include <signal.h>
> +#include <sys/stat.h>
> +#include <sys/time.h>
> +#include <sys/wait.h>
> +#include <time.h>
> +
> +#include "access/xlogdefs.h"
> +#include "catalog/pg_authid_d.h"
> +#include "catalog/pg_control.h"
> +#include "common/connect.h"
> +#include "common/controldata_utils.h"
> +#include "common/file_perm.h"
> +#include "common/file_utils.h"
> +#include "common/logging.h"
> +#include "common/restricted_token.h"
> +#include "fe_utils/recovery_gen.h"
> +#include "fe_utils/simple_list.h"
> +#include "getopt_long.h"
> +#include "utils/pidfile.h"
> 
> 8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
> required or kept for future purpose:
> +enum WaitPMResult
> +{
> +       POSTMASTER_READY,
> +       POSTMASTER_STANDBY,
> +       POSTMASTER_STILL_STARTING,
> +       POSTMASTER_FAILED
> +};

I think they are here because WaitPMResult is just ported from pg_ctl.c.
I renamed the enumeration and removed non-necessary attributes.

> 9) pg_createsubscriber should be kept after pg_basebackup to maintain
> the consistent order:
> diff --git a/src/bin/pg_basebackup/.gitignore
> b/src/bin/pg_basebackup/.gitignore
> index 26048bdbd8..b3a6f5a2fe 100644
> --- a/src/bin/pg_basebackup/.gitignore
> +++ b/src/bin/pg_basebackup/.gitignore
> @@ -1,5 +1,6 @@
>  /pg_basebackup
>  /pg_receivewal
>  /pg_recvlogical
> +/pg_createsubscriber

Addressed.

> 10) dry-run help message is not very clear, how about something
> similar to pg_upgrade's message like "check clusters only, don't
> change any data":
> +       printf(_(" -d, --database=DBNAME               database to
> create a subscription\n"));
> +       printf(_(" -n, --dry-run                       stop before
> modifying anything\n"));
> +       printf(_(" -t, --recovery-timeout=SECS         seconds to wait
> for recovery to end\n"));
> +       printf(_(" -r, --retain                        retain log file
> after success\n"));
> +       printf(_(" -v, --verbose                       output verbose
> messages\n"));
> +       printf(_(" -V, --version                       output version
> information, then exit\n"));

Changed.

New patch is available in [2].

[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

I've replied for trackability.

> Further comments for v17.
> 
> 01.
> This program assumes that the target server has same major version with this.
> Because the target server would be restarted by same version's pg_ctl command.
> I felt it should be ensured by reading the PG_VERSION.

Still investigating.

> 02.
> pg_upgrade checked the version of using executables, like pg_ctl, postgres, and
> pg_resetwal. I felt it should be as well.

Still investigating.

> 03. get_bin_directory
> ```
>     if (find_my_exec(path, full_path) < 0)
>     {
>         pg_log_error("The program \"%s\" is needed by %s but was not
> found in the\n"
>                      "same directory as \"%s\".\n",
>                      "pg_ctl", progname, full_path);
> ```
> 
> s/"pg_ctl"/progname

The message was updated.

> 04.
> Missing canonicalize_path()?

I found find_my_exec() calls canonicalize_path(). No need to do.

> 05.
> Assuming that the target server is a cascade standby, i.e., it has a role as
> another primary. In this case, I thought the child node would not work. Because
> pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at that
> time. I have not tested, but should the program detect it and exit earlier?

Still investigating.

> 06.
> wait_for_end_recovery() waits forever even if the standby has been disconnected
> from the primary, right? should we check the status of the replication via
> pg_stat_wal_receiver?

Still investigating.

> 07.
> The cleanup function has couple of bugs.
> 
> * If subscriptions have been created on the database, the function also tries to
>   drop a publication. But it leads an ERROR because it has been already dropped.
>   See setup_subscriber().
> * If the subscription has been created, drop_replication_slot() leads an ERROR.
>   Because the subscriber tried to drop the subscription while executing DROP
> SUBSCRIPTION.

Only drop_publication() was removed.

> 08.
> I found that all messages (ERROR, WARNING, INFO, etc...) would output to stderr,
> but I felt it should be on stdout. Is there a reason? pg_dump outputs messages to
> stderr, but the motivation might be to avoid confusion with dumps.

Still investigating.

> 09.
> I'm not sure the cleanup for subscriber is really needed. Assuming that there
> are two databases, e.g., pg1 pg2 , and we fail to create a subscription on pg2.
> This can happen when the subscription which has the same name has been
> already
> created on the primary server.
> In this case a subscirption pn pg1 would be removed. But what is a next step?
> Since a timelineID on the standby server is larger than the primary (note that
> the standby has been promoted once), we cannot resume the physical replication
> as-is. IIUC the easiest method to retry is removing a cluster once and restarting
> from pg_basebackup. If so, no need to cleanup the standby because it is
> corrupted.
> We just say "Please remove the cluster and recreate again".

I still think it should be, but not done yet.

New patch can be available in [1].

[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for testing the patch!

> 
> I tried verifying few scenarios by using 5 databases and came across
> the following errors:
> 
> ./pg_createsubscriber -D ../new_standby -P "host=localhost port=5432
> dbname=postgres" -S "host=localhost port=9000 dbname=postgres"  -d db1
> -d db2 -d db3 -d db4 -d db5
> 
> pg_createsubscriber: error: publisher requires 6 wal sender
> processes, but only 5 remain
> pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 7.
> 
> It is successful only with 7 wal senders, so we can change error
> messages accordingly.
> 
> 
> pg_createsubscriber: error: publisher requires 6 replication slots,
> but only 5 remain
> pg_createsubscriber: hint: Consider increasing max_replication_slots
> to at least 7.
> 
> It is successful only with 7 replication slots, so we can change error
> messages accordingly.

I'm not a original author but I don't think it is needed. The hint message has
already suggested you to change to 7. According to the doc [1],  the primary
message should be factual and hint message should be used for suggestions. I felt
current code followed the style. Thought?

New patch is available in [2].

[1]: https://www.postgresql.org/docs/devel/error-style-guide.html
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Feb 8, 2024, at 12:04 AM, Hayato Kuroda (Fujitsu) wrote:
>
Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application.
>

What if the client modifies the data just after the promotion?
Naively considered, all the changes can be accepted, but are there any issues?

If someone modifies data after promotion, fine; she has to deal with conflicts,
if any. IMO it is solved adding one or two sentences in the documentation.

>
Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].
>

IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.

>
I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?
>

My remained concern is written above. If they do not problematic we may not have
to restrict them for now. At that time, changes 

1) overwriting a port number,
2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as
option.

>
pglogical_create_subscriber does nothing [2][3].
>

Oh, thanks.
Just to confirm - pglogical set shared_preload_libraries to '', should we follow or not?

The in-core logical replication does not require any library to be loaded.


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Fri, Feb 9, 2024, at 3:27 AM, vignesh C wrote:
On Wed, 7 Feb 2024 at 10:24, Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!

Thanks for the updated patch, few comments:
Few comments:
1) Cleanup function handler flag should be reset, i.e.
dbinfo->made_replslot = false; should be there else there will be an
error during  drop replication slot cleanup in error flow:

Why? drop_replication_slot() is basically called by atexit().

2) Cleanup function handler flag should be reset, i.e.
dbinfo->made_publication = false; should be there else there will be
an error during drop publication cleanup in error flow:

Ditto. drop_publication() is basically called by atexit().


3) Cleanup function handler flag should be reset, i.e.
dbinfo->made_subscription = false; should be there else there will be
an error during drop publication cleanup in error flow:

Ditto. drop_subscription() is only called by atexit().

4) I was not sure if drop_publication is required here, as we will not
create any publication in subscriber node:
+               if (dbinfo[i].made_subscription)
+               {
+                       conn = connect_database(dbinfo[i].subconninfo);
+                       if (conn != NULL)
+                       {
+                               drop_subscription(conn, &dbinfo[i]);
+                               if (dbinfo[i].made_publication)
+                                       drop_publication(conn, &dbinfo[i]);
+                               disconnect_database(conn);
+                       }
+               }

setup_subscriber() explains the reason.

        /*
         * Since the publication was created before the consistent LSN, it is
         * available on the subscriber when the physical replica is promoted.
         * Remove publications from the subscriber because it has no use.
         */
        drop_publication(conn, &dbinfo[i]);

I changed the referred code a bit because it is not reliable. Since
made_subscription was not set until we create the subscription, the
publications that were created on primary and replicated to standby won't be
removed on subscriber. Instead, it should rely on the recovery state to decide
if it should drop it.

5) The connection should be disconnected in case of error case:
+       /* secure search_path */
+       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not clear search_path: %s",
PQresultErrorMessage(res));
+               return NULL;
+       }
+       PQclear(res);

connect_database() is usually followed by a NULL test and exit(1) if it cannot
connect. It should be added for correctness but it is not a requirement.

7) These includes are not required:
7.a) #include <signal.h>
7.b) #include <sys/stat.h>
7.c) #include <sys/wait.h>
7.d) #include "access/xlogdefs.h"
7.e) #include "catalog/pg_control.h"
7.f) #include "common/file_utils.h"
7.g) #include "utils/pidfile.h"

Good catch. I was about to review the include files.

8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
required or kept for future purpose:
+enum WaitPMResult
+{
+       POSTMASTER_READY,
+       POSTMASTER_STANDBY,
+       POSTMASTER_STILL_STARTING,
+       POSTMASTER_FAILED
+};

I just copied verbatim from pg_ctl. We should remove the superfluous states.

9) pg_createsubscriber should be kept after pg_basebackup to maintain
the consistent order:

Ok.


10) dry-run help message is not very clear, how about something
similar to pg_upgrade's message like "check clusters only, don't
change any data":

$ /tmp/pgdevel/bin/pg_archivecleanup --help | grep dry-run
  -n, --dry-run               dry run, show the names of the files that would be
$ /tmp/pgdevel/bin/pg_combinebackup --help | grep dry-run
  -n, --dry-run             don't actually do anything
$ /tmp/pgdevel/bin/pg_resetwal --help | grep dry-run
  -n, --dry-run          no update, just show what would be done
$ /tmp/pgdevel/bin/pg_rewind --help | grep dry-run
  -n, --dry-run                  stop before modifying anything
$ /tmp/pgdevel/bin/pg_upgrade --help | grep check  
  -c, --check                   check clusters only, don't change any data

I used the same sentence as pg_rewind but I'm fine with pg_upgrade or
pg_combinebackup sentences.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

>
If someone modifies data after promotion, fine; she has to deal with conflicts,
if any. IMO it is solved adding one or two sentences in the documentation.
>

OK. I could find issues, for now.

>
> >
> Regarding
> GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
> server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
> pg_createsubscriber execution [1].
> >
> 
> IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.
>

Just to confirm - even if the primary_slot_name would be changed during
the conversion, the slot initially set would be dropped. Currently we do not
find any issues.

>
> >
> I'm just pointing out that this case is a different from pg_upgrade (from which
> this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
> you might be preventing some access read only access (monitoring) when it is
> perfectly fine to connect to the database and execute queries. As I said
> before, the current UI allows anyone to setup the standby to accept only local
> connections. Of course, it is an extra step but it is possible. However, once
> you apply v16-0007, there is no option but use only local connection during the
> transformation. Is it an acceptable limitation?
> >
> 
> My remained concern is written above. If they do not problematic we may not have
> to restrict them for now. At that time, changes 
> 
> 1) overwriting a port number,
> 2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

> are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as
option.
>


Hmm, OK. So let me summarize current status and discussions. 

Policy)

Basically, we do not prohibit to connect to primary/standby.
primary_slot_name may be changed during the conversion and
tuples may be inserted on target just after the promotion, but it seems no issues.

API)

-D (data directory) and -d (databases) are definitively needed.

Regarding the -P (a connection string for source), we can require it for now.
But note that it may cause an inconsistency if the pointed not by -P is different
from the node pointde by primary_conninfo.

As for the connection string for the target server, we can choose two ways:
a)
accept native connection string as -S. This can reuse the same parsing mechanism as -P,
but there is a room that non-local server is specified.

b)
accept username/port as -U/-p
(Since the policy is like above, listen_addresses would not be overwritten. Also, the port just specify the listening
port).
This can avoid connecting to non-local, but more options may be needed.
(E.g., Is socket directory needed? What about password?)

Other discussing point, reported issue)

Points raised by me [1] are not solved yet.

* What if the target version is PG16-?
* What if the found executables have diffent version with pg_createsubscriber?
* What if the target is sending WAL to another server?
   I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
* Can we really cleanup the standby in case of failure?
   Shouldn't we suggest to remove the target once?
* Can we move outputs to stdout?

[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Thu, Feb 15, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> Here are my minor comments for 17.
>
> 01.
> ```
> /* Options */
> static const char *progname;
>
> static char *primary_slot_name = NULL;
> static bool dry_run = false;
>
> static bool success = false;
>
> static LogicalRepInfo *dbinfo;
> static int      num_dbs = 0;
> ```
>
> The comment seems out-of-date. There is only one option.
>
> 02. check_subscriber and check_publisher
>
> Missing pg_catalog prefix in some lines.
>
> 03. get_base_conninfo
>
> I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.
>
>
> 04.
>
> I check the coverage and found two functions have been never called:
>  - drop_subscription
>  - drop_replication_slot
>
> Also, some cases were not tested. Below bullet showed notable ones for me.
> (Some of them would not be needed based on discussions)
>
> * -r is specified
> * -t is specified
> * -P option contains dbname
> * -d is not specified
> * GUC settings are wrong
> * primary_slot_name is specified on the standby
> * standby server is not working
>
> In feature level, we may able to check the server log is surely removed in case
> of success.
>
> So, which tests should be added? drop_subscription() is called only when the
> cleanup phase, so it may be difficult to test. According to others, it seems that
> -r and -t are not tested. GUC-settings have many test cases so not sure they
> should be. Based on this, others can be tested.
>
> PSA my top-up patch set.
>
> V18-0001: same as your patch, v17-0001.
> V18-0002: modify the alignment of codes.
> V18-0003: change an argument of get_base_conninfo. Per comment 3.
> === experimental patches ===
> V18-0004: Add testcases per comment 4.
> V18-0005: Remove -P option. I'm not sure it should be needed, but I made just in case.

I created a cascade Physical Replication system like
node1->node2->node3 and ran pg_createsubscriber for node2. After
running the script, I started the node2 again and found
pg_createsubscriber command was successful after which the physical
replication between node2 and node3 has been broken. I feel
pg_createsubscriber should check this scenario and throw an error in
this case to avoid breaking the cascaded replication setup. I have
attached the script which was used to verify this.

Thanks and Regards,
Shubham Khanna.

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler, 

> Policy)
> 
> Basically, we do not prohibit to connect to primary/standby.
> primary_slot_name may be changed during the conversion and
> tuples may be inserted on target just after the promotion, but it seems no issues.
> 
> API)
> 
> -D (data directory) and -d (databases) are definitively needed.
> 
> Regarding the -P (a connection string for source), we can require it for now.
> But note that it may cause an inconsistency if the pointed not by -P is different
> from the node pointde by primary_conninfo.
> 
> As for the connection string for the target server, we can choose two ways:
> a)
> accept native connection string as -S. This can reuse the same parsing
> mechanism as -P,
> but there is a room that non-local server is specified.
> 
> b)
> accept username/port as -U/-p
> (Since the policy is like above, listen_addresses would not be overwritten. Also,
> the port just specify the listening port).
> This can avoid connecting to non-local, but more options may be needed.
> (E.g., Is socket directory needed? What about password?)
> 
> Other discussing point, reported issue)
> 
> Points raised by me [1] are not solved yet.
> 
> * What if the target version is PG16-?
> * What if the found executables have diffent version with pg_createsubscriber?
> * What if the target is sending WAL to another server?
>    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> * Can we really cleanup the standby in case of failure?
>    Shouldn't we suggest to remove the target once?
> * Can we move outputs to stdout?

Based on the discussion, I updated the patch set. Feel free to pick them and include.
Removing -P patch was removed, but removing -S still remained.

Also, while testing the patch set, I found some issues.

1.
Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
but the inclusion was removed per comment. Added the inclusion again.

2.
As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
the last node would stuck. This is because a walreciever process requires nodes have the same
system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

3.
Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
publications on the standby node.

4.
If the standby server was initialized as PG16-, this command would fail.
Because the API of pg_logical_create_replication_slot() were changed.

5.
Also, used pg_ctl commands must have same versions with the instance.
I think we should require all the executables and servers must be a same major version.

Based on them, below part describes attached ones:

V20-0001: same as Euler's patch, v17-0001.
V20-0002: Update docs per recent changes. Same as v19-0002
V20-0003: Modify the alignment of codes. Same as v19-0003
V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
=== experimental patches ===
V20-0005: Add testcases. Same as v19-0004
V20-0006: Update a comment above global variables. Same as v19-0005
V20-0007: Address comments from Vignesh. Some parts you don't like
          are reverted.
V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
V20-0009: Remove -S option. Refactored from v16-0007
V20-0010: Add check versions of executables and the target, per above and [4]
V20-0011: Detect a disconnection while waiting the recovery, per [4]
V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.

[1]: https://cirrus-ci.com/task/4619792833839104
[2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
[4]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Feb 15, 2024, at 8:23 AM, Hayato Kuroda (Fujitsu) wrote:
> Points raised by me [1] are not solved yet.

> * What if the target version is PG16-?

pg_ctl and pg_resetwal won't work.

$ pg_ctl start -D /tmp/blah
waiting for server to start....
2024-02-15 23:50:03.448 -03 [364610] FATAL:  database files are incompatible with server
2024-02-15 23:50:03.448 -03 [364610] DETAIL:  The data directory was initialized by PostgreSQL version 16, which is not compatible with this version 17devel.
stopped waiting
pg_ctl: could not start server
Examine the log output.

$ pg_resetwal -D /tmp/blah
pg_resetwal: error: data directory is of wrong version
pg_resetwal: detail: File "PG_VERSION" contains "16", which is not compatible with this program's version "17".

> * What if the found executables have diffent version with pg_createsubscriber?

The new code take care of it.

> * What if the target is sending WAL to another server?
>    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.

The new code detects if the server is in recovery and aborts as you suggested.
A new option can be added to ignore the fact there are servers receiving WAL
from it.

> * Can we really cleanup the standby in case of failure?
>    Shouldn't we suggest to remove the target once?

If it finishes the promotion, no. I adjusted the cleanup routine a bit to avoid
it. However, we should provide instructions to inform the user that it should
create a fresh standby and try again.

> * Can we move outputs to stdout?

Are you suggesting to use another logging framework? It is not a good idea
because each client program is already using common/logging.c.

1.
Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
but the inclusion was removed per comment. Added the inclusion again.

Ok.

2.
As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
the last node would stuck. This is because a walreciever process requires nodes have the same
system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

Hopefully it was fixed.

3.
Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
publications on the standby node.

Ditto.

4.
If the standby server was initialized as PG16-, this command would fail.
Because the API of pg_logical_create_replication_slot() were changed.

See comment above.

5.
Also, used pg_ctl commands must have same versions with the instance.
I think we should require all the executables and servers must be a same major version.

It is enforced by the new code. See find_other_exec() in get_exec_path().

Based on them, below part describes attached ones:

Thanks for another review. I'm sharing a new patch to merge a bunch of
improvements and fixes. Comments are below.

v20-0002: I did some extensive documentation changes (including some of them
related to the changes in the new patch). I will defer its update to check
v20-0002. It will be included in the next one.

v20-0003: I included most of it. There are a few things that pgindent reverted
so I didn't apply. I also didn't like some SQL commands that were broken into
multiple lines with spaces at the beginning. It seems nice in the code but it
is not in the output.

v20-0004: Nice catch. Applied.

v20-0005: Applied.

v20-0006: I prefer to remove the comment.

v20-0007: I partially applied it. I only removed the states that were not used
and propose another dry run mode message. Maybe it is clear than it was.

v20-0008: I refactored the get_bin_directory code. Under reflection, I reverted
the unified binary directory that we agreed a few days ago. The main reason is
to provide a specific error message for each program it is using. The
get_exec_path will check if the program is available in the same directory as
pg_createsubscriber and if it has the same version. An absolute path is
returned and is used by some functions.

v20-0009: to be reviewed.

v20-0010: As I said above, this code was refactored so I didn't apply this one.

v20-0011: Do we really want to interrupt the recovery if the network was
momentarily interrupted or if the OS killed walsender? Recovery is critical for
the process. I think we should do our best to be resilient and deliver all
changes required by the new subscriber. The proposal is not correct because the
query return no tuples if it is disconnected so you cannot PQgetvalue(). The
retry interval (the time that ServerLoop() will create another walreceiver) is
determined by DetermineSleepTime() and it is a maximum of 5 seconds
(SIGKILL_CHILDREN_AFTER_SECS). One idea is to retry 2 or 3 times before give up
using the pg_stat_wal_receiver query. Do you have a better plan?

v20-0012: I applied a different patch to accomplish the same thing. I included
a refactor around pg_is_in_recovery() function to be used in other 2 points.

Besides that, I changed some SQL commands to avoid having superfluous
whitespace in it. I also added a test for cascaded replication scenario. And
clean up 041 test a bit.

I didn't provide an updated documentation because I want to check v20-0002. It
is on my list to check v20-0009.


--
Euler Taveira

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch!
Before reviewing deeply, here are replies for your comments.

>
> > Points raised by me [1] are not solved yet.
> > 
> > * What if the target version is PG16-?
pg_ctl and pg_resetwal won't work.
$ pg_ctl start -D /tmp/blah
waiting for server to start....
2024-02-15 23:50:03.448 -03 [364610] FATAL:  database files are incompatible with server
2024-02-15 23:50:03.448 -03 [364610] DETAIL:  The data directory was initialized by PostgreSQL version 16, which is not
compatiblewith this version 17devel.
 
stopped waiting
pg_ctl: could not start server
Examine the log output.

$ pg_resetwal -D /tmp/blah
pg_resetwal: error: data directory is of wrong version
pg_resetwal: detail: File "PG_VERSION" contains "16", which is not compatible with this program's version "17".

> > * What if the found executables have diffent version with pg_createsubscriber?

The new code take care of it.
>

I preferred to have a common path and test one by one, but agreed this worked well.
Let's keep it and hear opinions from others.

>
> > * What if the target is sending WAL to another server?
> >    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
The new code detects if the server is in recovery and aborts as you suggested.
A new option can be added to ignore the fact there are servers receiving WAL
from it.
>

Confirmed it can detect.

>
> > * Can we really cleanup the standby in case of failure?
> >    Shouldn't we suggest to remove the target once?

If it finishes the promotion, no. I adjusted the cleanup routine a bit to avoid
it. However, we should provide instructions to inform the user that it should
create a fresh standby and try again.
>

I think the cleanup function looks not sufficient. In v21, recovery_ended is kept
to true even after drop_publication() is done in setup_subscriber(). I think that
made_subscription is not needed, and the function should output some messages
when recovery_ended is on.
Besides, in case of pg_upgrade, they always report below messages before starting
the migration. I think this is more helpful for users.

```
    pg_log(PG_REPORT, "\n"
           "If pg_upgrade fails after this point, you must re-initdb the\n"
           "new cluster before continuing.");
```

>
> > * Can we move outputs to stdout?

Are you suggesting to use another logging framework? It is not a good idea
because each client program is already using common/logging.c.
>

Hmm, indeed. Other programs in pg_basebackup seem to use the framework.

>
v20-0011: Do we really want to interrupt the recovery if the network was
momentarily interrupted or if the OS killed walsender? Recovery is critical for
the process. I think we should do our best to be resilient and deliver all
changes required by the new subscriber.
>

It might be too strict to raise an ERROR as soon as we met a disconnection.
And at least 0011 was wrong - it should be PQgetvalue(res, 0, 1) for still_alive.

>
The proposal is not correct because the
query return no tuples if it is disconnected so you cannot PQgetvalue().
>

Sorry for misunderstanding, but you might be confused. pg_createsubcriber
sends a query to target, and we are discussing the disconnection between the
target and source instances. I think the connection which pg_createsubscriber
has is stil alive so PQgetvalue() can get a value.

(BTW, callers of server_is_in_recovery() has not considered a disconnection from
the target...)

>
The
retry interval (the time that ServerLoop() will create another walreceiver) is
determined by DetermineSleepTime() and it is a maximum of 5 seconds
(SIGKILL_CHILDREN_AFTER_SECS). One idea is to retry 2 or 3 times before give up
using the pg_stat_wal_receiver query. Do you have a better plan?
>

It's good to determine the threshold. It can define the number  of acceptable
loss of walreceiver during the loop.
I think we should retry at least the postmaster revives the walreceiver.
The checking would work once per seconds, so more than 5 (or 10) may be better.
Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Thu, Feb 15, 2024 at 4:53 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> > Policy)
> >
> > Basically, we do not prohibit to connect to primary/standby.
> > primary_slot_name may be changed during the conversion and
> > tuples may be inserted on target just after the promotion, but it seems no issues.
> >
> > API)
> >
> > -D (data directory) and -d (databases) are definitively needed.
> >
> > Regarding the -P (a connection string for source), we can require it for now.
> > But note that it may cause an inconsistency if the pointed not by -P is different
> > from the node pointde by primary_conninfo.
> >
> > As for the connection string for the target server, we can choose two ways:
> > a)
> > accept native connection string as -S. This can reuse the same parsing
> > mechanism as -P,
> > but there is a room that non-local server is specified.
> >
> > b)
> > accept username/port as -U/-p
> > (Since the policy is like above, listen_addresses would not be overwritten. Also,
> > the port just specify the listening port).
> > This can avoid connecting to non-local, but more options may be needed.
> > (E.g., Is socket directory needed? What about password?)
> >
> > Other discussing point, reported issue)
> >
> > Points raised by me [1] are not solved yet.
> >
> > * What if the target version is PG16-?
> > * What if the found executables have diffent version with pg_createsubscriber?
> > * What if the target is sending WAL to another server?
> >    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> > * Can we really cleanup the standby in case of failure?
> >    Shouldn't we suggest to remove the target once?
> > * Can we move outputs to stdout?
>
> Based on the discussion, I updated the patch set. Feel free to pick them and include.
> Removing -P patch was removed, but removing -S still remained.
>
> Also, while testing the patch set, I found some issues.
>
> 1.
> Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
> but the inclusion was removed per comment. Added the inclusion again.
>
> 2.
> As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
> the last node would stuck. This is because a walreciever process requires nodes have the same
> system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.
>
> 3.
> Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
> publications on the standby node.
>
> 4.
> If the standby server was initialized as PG16-, this command would fail.
> Because the API of pg_logical_create_replication_slot() were changed.
>
> 5.
> Also, used pg_ctl commands must have same versions with the instance.
> I think we should require all the executables and servers must be a same major version.
>
> Based on them, below part describes attached ones:
>
> V20-0001: same as Euler's patch, v17-0001.
> V20-0002: Update docs per recent changes. Same as v19-0002
> V20-0003: Modify the alignment of codes. Same as v19-0003
> V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
> === experimental patches ===
> V20-0005: Add testcases. Same as v19-0004
> V20-0006: Update a comment above global variables. Same as v19-0005
> V20-0007: Address comments from Vignesh. Some parts you don't like
>           are reverted.
> V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
> V20-0009: Remove -S option. Refactored from v16-0007
> V20-0010: Add check versions of executables and the target, per above and [4]
> V20-0011: Detect a disconnection while waiting the recovery, per [4]
> V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.
>
> [1]: https://cirrus-ci.com/task/4619792833839104
> [2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
> [3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
> [4]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

I found a couple of issues, while verifying the cascaded replication
with the following scenarios:
Scenario 1) Create cascade replication like node1->node2->node3
without using replication slots (attached
cascade_3node_setup_without_slots has the script for this):
Then I ran pg_createsubscriber by specifying primary as node1 and
standby as node3, this scenario runs successfully. I was not sure if
this should be supported or not?
Scenario 2) Create cascade replication like node1->node2->node3 using
replication slots (attached cascade_3node_setup_with_slots has the
script for this):
Here, slot name was used as slot1 for node1 to node2 and slot2 for
node2 to node3. Then I ran pg_createsubscriber by specifying primary
as node1 and standby as node3. In this case pg_createsubscriber fails
with the following error:
pg_createsubscriber: error: could not obtain replication slot
information: got 0 rows, expected 1 row
[Inferior 1 (process 2623483) exited with code 01]

This is failing because slot name slot2 is used between node2->node3
but pg_createsubscriber is checked for slot1, the slot which is used
for replication between node1->node2.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

Thanks for testing. It seems you ran with v20 patch, but I confirmed
It could reproduce with v21.

> 
> I found a couple of issues, while verifying the cascaded replication
> with the following scenarios:
> Scenario 1) Create cascade replication like node1->node2->node3
> without using replication slots (attached
> cascade_3node_setup_without_slots has the script for this):
> Then I ran pg_createsubscriber by specifying primary as node1 and
> standby as node3, this scenario runs successfully. I was not sure if
> this should be supported or not?

Hmm. After the script, the cascading would be broken. The replication would be:

```
Node1 -> node2
  |
Node3
```

And the operation is bit strange. The consistent LSN is gotten from the node1,
but node3 waits until it receives the record from NODE2.
Can we always success it?

> Scenario 2) Create cascade replication like node1->node2->node3 using
> replication slots (attached cascade_3node_setup_with_slots has the
> script for this):
> Here, slot name was used as slot1 for node1 to node2 and slot2 for
> node2 to node3. Then I ran pg_createsubscriber by specifying primary
> as node1 and standby as node3. In this case pg_createsubscriber fails
> with the following error:
> pg_createsubscriber: error: could not obtain replication slot
> information: got 0 rows, expected 1 row
> [Inferior 1 (process 2623483) exited with code 01]
> 
> This is failing because slot name slot2 is used between node2->node3
> but pg_createsubscriber is checked for slot1, the slot which is used
> for replication between node1->node2.
> Thoughts?

Right. The inconsistency is quite strange.

Overall, I felt such a case must be rejected. How should we detect at checking phase?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Here are comments for v21.

01. main
```
    /* rudimentary check for a data directory. */
...
    /* subscriber PID file. */
```

Initial char must be upper, and period is not needed.

02. check_data_directory
```
    snprintf(versionfile, MAXPGPATH, "%s/PG_VERSION", datadir);
```

You removed the version checking from PG_VERSION, but I think it is still needed.
Indeed v21 can detect the pg_ctl/pg_resetwal/pg_createsubscriber has different
verson, but this cannot ditect the started instance has the differnet version.
I.e., v20-0010 is partially needed.

03. store_pub_sub_info()
```
    SimpleStringListCell *cell;
```

This definition can be in loop variable.

04. get_standby_sysid()
```
    pfree(cf);
```

This can be pg_pfree().

05. check_subscriber
```
    /* The target server must be a standby */
    if (server_is_in_recovery(conn) == 0)
    {
        pg_log_error("The target server is not a standby");
        return false;
    }
```

What if the the function returns -1? Should we ditect (maybe the disconnection) here?

06. server_is_in_recovery
```
    ret = strcmp("t", PQgetvalue(res, 0, 0));

    PQclear(res);

    if (ret == 0)
        return 1;
    else if (ret > 0)
        return 0;
    else
        return -1;                /* should not happen */
```

But strcmp may return a negative value, right? Based on the comment atop function,
we should not do it. I think we can use ternary operator instead.

07. server_is_in_recovery

As the fisrt place, no one consider this returns -1. So can we change the bool
function and raise pg_fatal() in case of the error?

08. check_subscriber
```
    if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
    {
        pg_log_error("permission denied for function \"%s\"",
                     "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
        return false;
    }
```

I think the third argument must be 2.

09. check_subscriber
```
    pg_log_debug("subscriber: primary_slot_name: %s", primary_slot_name);
```

The output seems strange if the primary_slot_name is not set.

10. setup_publisher()
```
    PGconn       *conn;
    PGresult   *res;
```

Definitions can be in the loop.

11. create_publication()
```
    if (PQntuples(res) == 1)
    {
        /*
         * If publication name already exists and puballtables is true, let's
         * use it. A previous run of pg_createsubscriber must have created
         * this publication. Bail out.
         */
```

Hmm, but pre-existing publications may not send INSERT/UPDATE/DELETE/TRUNCATE.
They should be checked if we really want to reuse.
(I think it is OK to just raise ERROR)

12. create_publication()

Based on above, we do not have to check before creating publicatios. The publisher
can detect the duplication. I prefer it.

13. create_logical_replication_slot()
```
        if (PQresultStatus(res) != PGRES_TUPLES_OK)
        {
            pg_log_error("could not create replication slot \"%s\" on database \"%s\": %s",
                         slot_name, dbinfo->dbname,
                         PQresultErrorMessage(res));
            return lsn;
        }
```

I know lsn is always NULL, but can we use `return NULL`?

14. setup_subscriber()
```
    PGconn       *conn;

```

This definition can be in the loop.


15.

You said in case of failure, cleanups is not needed if the process exits soon [1].
But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?

16.

Some places refer PGresult or PGConn even after the cleanup. They must be fixed.
```
        PQclear(res);
        disconnect_database(conn);
        pg_fatal("could not get system identifier: %s",
                 PQresultErrorMessage(res));
```

I think this is a root cause why sometimes the wrong error message has output.


17.

Some places call PQerrorMessage() and other places call PQresultErrorMessage().
I think it PQerrorMessage() should be used only after the connection establishment
functions. Thought?

18. 041_pg_createsubscriber_standby.pl
```
use warnings;
```

We must set "FATAL = all";

19.
```
my $node_p;
my $node_f;
my $node_s;
my $node_c;
my $result;
my $slotname;
```

I could not find forward declarations in perl file.
The node name might be bit a consuging, but I could not find better name.

20.
```
# On node P
# - create databases
# - create test tables
# - insert a row
# - create a physical replication slot
$node_p->safe_psql(
    'postgres', q(
    CREATE DATABASE pg1;
    CREATE DATABASE pg2;
));
$node_p->safe_psql('pg1', 'CREATE TABLE tbl1 (a text)');
$node_p->safe_psql('pg1', "INSERT INTO tbl1 VALUES('first row')");
$node_p->safe_psql('pg2', 'CREATE TABLE tbl2 (a text)');
my $slotname = 'physical_slot';
$node_p->safe_psql('pg2',
    "SELECT pg_create_physical_replication_slot('$slotname')");
```

I think setting of the same node can be gathered into one place.
Also, any settings and definitions should be done just before they are used.

21.
```
$node_s->append_conf(
    'postgresql.conf', qq[
log_min_messages = debug2
primary_slot_name = '$slotname'
]);
```

I could not find a reason why we set to debug2.

22.
```
command_fails
```

command_checks_all() can check returned value and outputs.
Should we use it?

23.

Can you add headers for each testcases? E.g., 

```
# ------------------------------
# Check pg_createsubscriber fails when the target server is not a
# standby of the source.
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is not running
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is a member of
# the cascading standby.
...
# ------------------------------
# Check successful dry-run 
...
# ------------------------------
# Check successful conversion
```

24.
```
# Stop node C
$node_c->teardown_node;
...
$node_p->stop;
$node_s->stop;
$node_f->stop;
```
Why you choose the teardown?

25.

The creation of subscriptions are not directory tested. @subnames contains the
name of subscriptions, but it just assumes the number of them is more than two.

Since it may be useful, I will post top-up patch on Monday, if there are no updating.

[1]: https://www.postgresql.org/message-id/89ccf72b-59f9-4317-b9fd-1e6d20a0c3b1%40app.fastmail.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

> Since it may be useful, I will post top-up patch on Monday, if there are no
> updating.

And here are top-up patches. Feel free to check and include.

v22-0001: Same as v21-0001.
=== rebased patches ===
v22-0002: Update docs per recent changes. Same as v20-0002.
v22-0003: Add check versions of the target. Extracted from v20-0003.
v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
          slightly changed.
=== Newbie ===
V22-0005: Addressed my comments which seems to be trivial[1].
          Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
v22-0006: Consider the scenario when commands are failed after the recovery.
          drop_subscription() is removed and some messages are added per [2].
V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
V22-0011: Update testcode. Per comments #17- [1].

I did not handle below points because I have unclear points.

a. 
This patch set cannot detect the disconnection between the target (standby) and
source (primary) during the catch up. Because the connection status must be gotten
at the same time (=in the same query) with the recovery status, but now it is now an
independed function (server_is_in_recovery()).

b.
This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
come up with solutions without removing -P...

[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077E98F930C3DE6BD304D0DF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAHv8Rj%2B5mzK9Jt%2B7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
Some review of the v21 patch:

- commit message

Mention pg_createsubscriber in the commit message title.  That's the 
most important thing that someone doing git log searches in the future 
will be looking for.


- doc/src/sgml/ref/allfiles.sgml

Move the new entry to alphabetical order.


- doc/src/sgml/ref/pg_createsubscriber.sgml

+  <para>
+   The <application>pg_createsubscriber</application> should be run at 
the target
+   server. The source server (known as publisher server) should accept 
logical
+   replication connections from the target server (known as subscriber 
server).
+   The target server should accept local logical replication connection.
+  </para>

"should" -> "must" ?

+ <refsect1>
+  <title>Options</title>

Sort options alphabetically.

It would be good to indicate somewhere which options are mandatory.

+ <refsect1>
+  <title>Examples</title>

I suggest including a pg_basebackup call into this example, so it's
easier for readers to get the context of how this is supposed to be
used.  You can add that pg_basebackup in this example is just an example 
and that other base backups can also be used.


- doc/src/sgml/reference.sgml

Move the new entry to alphabetical order.


- src/bin/pg_basebackup/Makefile

Move the new sections to alphabetical order.


- src/bin/pg_basebackup/meson.build

Move the new sections to alphabetical order.


- src/bin/pg_basebackup/pg_createsubscriber.c

+typedef struct CreateSubscriberOptions
+typedef struct LogicalRepInfo

I think these kinds of local-use struct don't need to be typedef'ed.
(Then you also don't need to update typdefs.list.)

+static void
+usage(void)

Sort the options alphabetically.

+static char *
+get_exec_path(const char *argv0, const char *progname)

Can this not use find_my_exec() and find_other_exec()?

+int
+main(int argc, char **argv)

Sort the options alphabetically (long_options struct, getopt_long()
argument, switch cases).


- .../t/040_pg_createsubscriber.pl
- .../t/041_pg_createsubscriber_standby.pl

These two files could be combined into one.

+# Force it to initialize a new cluster instead of copying a
+# previously initdb'd cluster.

Explain why?

+$node_s->append_conf(
+    'postgresql.conf', qq[
+log_min_messages = debug2

Is this setting necessary for the test?




Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

I have reviewed the v21 patch. And found an issue.

Initially I started the standby server with a new postgresql.conf file
(not the default postgresql.conf that is present in the instance).
pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"

And I have made 'max_replication_slots = 1' in new postgresql.conf and
made  'max_replication_slots = 0' in the default postgresql.conf file.
Now when we run pg_createsubscriber on standby we get error:
pg_createsubscriber: error: could not set replication progress for the
subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
manipulate replication origin when max_replication_slots = 0
NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR:  publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: error: could not drop replication slot
"pg_createsubscriber_5_242843" on database "postgres": ERROR:
replication slot "pg_createsubscriber_5_242843" does not exist

I observed that when we run the pg_createsubscriber command, it will
stop the standby instance (the non-default postgres configuration) and
restart the standby instance which will now be started with default
postgresql.conf, where the 'max_replication_slot = 0' and
pg_createsubscriber will now fail with the error given above.
I have added the script file with which we can reproduce this issue.
Also similar issues can happen with other configurations such as port, etc.

The possible solution would be
1) allow to run pg_createsubscriber if standby is initially stopped .
I observed that pg_logical_createsubscriber also uses this approach.
2) read GUCs via SHOW command and restore them when server restarts

I would prefer the 1st solution.

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Feb 19, 2024, at 6:47 AM, Peter Eisentraut wrote:
Some review of the v21 patch:

Thanks for checking.

- commit message

Mention pg_createsubscriber in the commit message title.  That's the 
most important thing that someone doing git log searches in the future 
will be looking for.

Right. Done.

- doc/src/sgml/ref/allfiles.sgml

Move the new entry to alphabetical order.

Done.


- doc/src/sgml/ref/pg_createsubscriber.sgml

+  <para>
+   The <application>pg_createsubscriber</application> should be run at 
the target
+   server. The source server (known as publisher server) should accept 
logical
+   replication connections from the target server (known as subscriber 
server).
+   The target server should accept local logical replication connection.
+  </para>

"should" -> "must" ?

Done.

+ <refsect1>
+  <title>Options</title>

Sort options alphabetically.

Done.

It would be good to indicate somewhere which options are mandatory.

I'll add this information in the option description. AFAICT the synopsis kind
of indicates it.

+ <refsect1>
+  <title>Examples</title>

I suggest including a pg_basebackup call into this example, so it's
easier for readers to get the context of how this is supposed to be
used.  You can add that pg_basebackup in this example is just an example 
and that other base backups can also be used.

We can certainly add it but creating a standby isn't out of scope here? I will
make sure to include references to pg_basebackup and the "Setting up a Standby
Server" section.

- doc/src/sgml/reference.sgml

Move the new entry to alphabetical order.

Done.

- src/bin/pg_basebackup/Makefile

Move the new sections to alphabetical order.

Done.

- src/bin/pg_basebackup/meson.build

Move the new sections to alphabetical order.

Done.


- src/bin/pg_basebackup/pg_createsubscriber.c

+typedef struct CreateSubscriberOptions
+typedef struct LogicalRepInfo

I think these kinds of local-use struct don't need to be typedef'ed.
(Then you also don't need to update typdefs.list.)

Done.

+static void
+usage(void)

Sort the options alphabetically.

Are you referring to s/options/functions/?

+static char *
+get_exec_path(const char *argv0, const char *progname)

Can this not use find_my_exec() and find_other_exec()?

It is indeed using it. I created this function because it needs to run the same
code path twice (pg_ctl and pg_resetwal).

+int
+main(int argc, char **argv)

Sort the options alphabetically (long_options struct, getopt_long()
argument, switch cases).

Done.

- .../t/040_pg_createsubscriber.pl
- .../t/041_pg_createsubscriber_standby.pl

These two files could be combined into one.

Done.

+# Force it to initialize a new cluster instead of copying a
+# previously initdb'd cluster.

Explain why?

Ok. It needs a new cluster because it will have a different system identifier
so we can make sure the target cluster is a copy of the source server.

+$node_s->append_conf(
+ 'postgresql.conf', qq[
+log_min_messages = debug2

Is this setting necessary for the test?

No. It is here as a debugging aid. Better to include it in a separate patch.
There are a few messages that I don't intend to include in the final patch.

All of these modifications will be included in the next patch. I'm finishing to
integrate patches proposed by Hayato [1] and some additional fixes and
refactors.




--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Feb 19, 2024, at 7:22 AM, Shlok Kyal wrote:
I have reviewed the v21 patch. And found an issue.

Initially I started the standby server with a new postgresql.conf file
(not the default postgresql.conf that is present in the instance).
pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"

And I have made 'max_replication_slots = 1' in new postgresql.conf and
made  'max_replication_slots = 0' in the default postgresql.conf file.
Now when we run pg_createsubscriber on standby we get error:
pg_createsubscriber: error: could not set replication progress for the
subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
manipulate replication origin when max_replication_slots = 0

That's by design. See [1]. The max_replication_slots parameter is used as the
maximum number of subscriptions on the server.

NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR:  publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: error: could not drop replication slot
"pg_createsubscriber_5_242843" on database "postgres": ERROR:
replication slot "pg_createsubscriber_5_242843" does not exist

That's a bug and should be fixed.



--
Euler Taveira

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

On Tue, 20 Feb 2024 at 06:59, Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Feb 19, 2024, at 7:22 AM, Shlok Kyal wrote:
>
> I have reviewed the v21 patch. And found an issue.
>
> Initially I started the standby server with a new postgresql.conf file
> (not the default postgresql.conf that is present in the instance).
> pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"
>
> And I have made 'max_replication_slots = 1' in new postgresql.conf and
> made  'max_replication_slots = 0' in the default postgresql.conf file.
> Now when we run pg_createsubscriber on standby we get error:
> pg_createsubscriber: error: could not set replication progress for the
> subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
> manipulate replication origin when max_replication_slots = 0
>
>
> That's by design. See [1]. The max_replication_slots parameter is used as the
> maximum number of subscriptions on the server.
>
> NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: error: could not drop replication slot
> "pg_createsubscriber_5_242843" on database "postgres": ERROR:
> replication slot "pg_createsubscriber_5_242843" does not exist
>
>
> That's a bug and should be fixed.
>
> [1] https://www.postgresql.org/docs/current/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SUBSCRIBER
>
I think you misunderstood the issue, I reported.

My main concern is that the standby server is using different
'postgresql.conf' file (the default file) after :
+   /* Start subscriber and wait until accepting connections */
+   pg_log_info("starting the subscriber");
+   if (!dry_run)
+       start_standby_server(pg_ctl_path, opt.subscriber_dir, server_start_log);

But we initially started the standby server (before running the
pg_createsubscriber) with a new postgresql.conf file (different from
the default file. for this example created inside the 'new_path'
folder).
pg_ctl -D ../standby -l standby.log  start -o "-c
config_file=../new_path/postgresql.conf"

So, when we run the pg_createsubscriber, all the initial checks will
be run on the standby server started using the new postgresql.conf
file. But during pg_createsubscriber, it will restart the standby
server using the default postgresql.conf file. And this might create
some issues.

Thanks and Regards,
Shlok Kyal



Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...
>

Few comments for v22-0001 patch:
1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
+       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+       {
+               pg_log_error("permission denied for database %s",
dbinfo[0].dbname);
+               return false;
+       }
+       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+       {
+               pg_log_error("permission denied for function \"%s\"",
+
"pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
+               return false;
+       }

2) pg_createsubscriber fails if a table is parallely created in the
primary node:
2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
accept connections
2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
does not exist
2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
enabled = false)

If we are not planning to fix this, at least it should be documented

3) Error conditions is verbose mode has invalid error message like
"out of memory" messages like in below:
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster reached the consistent state
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_278343" on database "postgres"
pg_createsubscriber: error: could not create subscription
"pg_createsubscriber_5_278343" on database "postgres": out of memory

4) In error cases we try to drop this publication again resulting in error:
+               /*
+                * Since the publication was created before the
consistent LSN, it is
+                * available on the subscriber when the physical
replica is promoted.
+                * Remove publications from the subscriber because it
has no use.
+                */
+               drop_publication(conn, &dbinfo[i]);

Which throws these errors(because of drop publication multiple times):
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR:  publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: dropping the replication slot
"pg_createsubscriber_5_278343" on database "postgres"

5) In error cases, wait_for_end_recovery waits even though it has
identified that the replication between primary and standby is
stopped:
+/*
+ * Is recovery still in progress?
+ * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
+ * while executing the query, it returns -1.
+ */
+static int
+server_is_in_recovery(PGconn *conn)
+{
+       PGresult   *res;
+       int                     ret;
+
+       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
+
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               PQclear(res);
+               pg_log_error("could not obtain recovery progress");
+               return -1;
+       }
+

You can simulate this by stopping the primary just before
wait_for_end_recovery and you will see these error messages, but
pg_createsubscriber will continue to wait:
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
...

Regards,
Vignesh



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

Thanks for giving comments!

> Few comments for v22-0001 patch:
> 1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
> be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
> +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> +       {
> +               pg_log_error("permission denied for database %s",
> dbinfo[0].dbname);
> +               return false;
> +       }
> +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> +       {
> +               pg_log_error("permission denied for function \"%s\"",
> +
> "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
> +               return false;
> +       }

I have already pointed out as comment #8 [1] and fixed in v22-0005.

> 2) pg_createsubscriber fails if a table is parallely created in the
> primary node:
> 2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
> accept connections
> 2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
> does not exist
> 2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
> pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
> pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
> enabled = false)
> 
> If we are not planning to fix this, at least it should be documented

The error will be occurred when tables are created after the promotion, right?
I think it cannot be fixed until DDL logical replication would be implemented.
So, +1 to add descriptions.

> 3) Error conditions is verbose mode has invalid error message like
> "out of memory" messages like in below:
> pg_createsubscriber: waiting the postmaster to reach the consistent state
> pg_createsubscriber: postmaster reached the consistent state
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: creating subscription
> "pg_createsubscriber_5_278343" on database "postgres"
> pg_createsubscriber: error: could not create subscription
> "pg_createsubscriber_5_278343" on database "postgres": out of memory

Because some places use PQerrorMessage() wrongly. It should be
PQresultErrorMessage(). Fixed in v22-0005.

> 4) In error cases we try to drop this publication again resulting in error:
> +               /*
> +                * Since the publication was created before the
> consistent LSN, it is
> +                * available on the subscriber when the physical
> replica is promoted.
> +                * Remove publications from the subscriber because it
> has no use.
> +                */
> +               drop_publication(conn, &dbinfo[i]);
> 
> Which throws these errors(because of drop publication multiple times):
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: dropping the replication slot
> "pg_createsubscriber_5_278343" on database "postgres"

Right. One approach is to use DROP PUBLICATION IF EXISTS statement.
Thought?

> 5) In error cases, wait_for_end_recovery waits even though it has
> identified that the replication between primary and standby is
> stopped:
> +/*
> + * Is recovery still in progress?
> + * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
> + * while executing the query, it returns -1.
> + */
> +static int
> +server_is_in_recovery(PGconn *conn)
> +{
> +       PGresult   *res;
> +       int                     ret;
> +
> +       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
> +
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               PQclear(res);
> +               pg_log_error("could not obtain recovery progress");
> +               return -1;
> +       }
> +
> 
> You can simulate this by stopping the primary just before
> wait_for_end_recovery and you will see these error messages, but
> pg_createsubscriber will continue to wait:
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress

Yeah, v22-0001 cannot detect the disconnection from primary and standby.
V22-0007 can detect the standby crash, but v22 set could not detect the 
primary crash. Euler came up with an approach [2] for it but not implemented yet.


[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/2231a04b-f2d4-4a4e-b5cd-56be8b002427%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
vignesh C
Date:
On Tue, 20 Feb 2024 at 15:47, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for giving comments!
>
> > Few comments for v22-0001 patch:
> > 1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
> > be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
> > +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> > +       {
> > +               pg_log_error("permission denied for database %s",
> > dbinfo[0].dbname);
> > +               return false;
> > +       }
> > +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> > +       {
> > +               pg_log_error("permission denied for function \"%s\"",
> > +
> > "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
> > +               return false;
> > +       }
>
> I have already pointed out as comment #8 [1] and fixed in v22-0005.
>
> > 2) pg_createsubscriber fails if a table is parallely created in the
> > primary node:
> > 2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
> > accept connections
> > 2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
> > does not exist
> > 2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
> > pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
> > pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
> > enabled = false)
> >
> > If we are not planning to fix this, at least it should be documented
>
> The error will be occurred when tables are created after the promotion, right?
> I think it cannot be fixed until DDL logical replication would be implemented.
> So, +1 to add descriptions.
>
> > 3) Error conditions is verbose mode has invalid error message like
> > "out of memory" messages like in below:
> > pg_createsubscriber: waiting the postmaster to reach the consistent state
> > pg_createsubscriber: postmaster reached the consistent state
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: creating subscription
> > "pg_createsubscriber_5_278343" on database "postgres"
> > pg_createsubscriber: error: could not create subscription
> > "pg_createsubscriber_5_278343" on database "postgres": out of memory
>
> Because some places use PQerrorMessage() wrongly. It should be
> PQresultErrorMessage(). Fixed in v22-0005.
>
> > 4) In error cases we try to drop this publication again resulting in error:
> > +               /*
> > +                * Since the publication was created before the
> > consistent LSN, it is
> > +                * available on the subscriber when the physical
> > replica is promoted.
> > +                * Remove publications from the subscriber because it
> > has no use.
> > +                */
> > +               drop_publication(conn, &dbinfo[i]);
> >
> > Which throws these errors(because of drop publication multiple times):
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: error: could not drop publication
> > "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> > "pg_createsubscriber_5" does not exist
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: dropping the replication slot
> > "pg_createsubscriber_5_278343" on database "postgres"
>
> Right. One approach is to use DROP PUBLICATION IF EXISTS statement.
> Thought?

Another way would be to set made_publication to false in
drop_publication once the publication is dropped. This way after the
publication is dropped it will not try to drop the publication again
in cleanup_objects_atexit as the made_publication will be false now.

Regards,
Vignesh



Re: speed up a logical replica setup

From
Alvaro Herrera
Date:
On 2024-Feb-16, Hayato Kuroda (Fujitsu) wrote:

> 15.
> 
> You said in case of failure, cleanups is not needed if the process exits soon [1].
> But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?

Hmm, but doesn't this mean that the server will log an ugly message that
"client closed connection unexpectedly"?  I think it's nicer to close
the connection before terminating the process (especially since the
code for that is already written).

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
 of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)



Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...

Few comments regarding the documentation:
1) max_replication_slots information seems to be present couple of times:

+    <para>
+     The target instance must have
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     and <link
linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
+     configured to a value greater than or equal to the number of target
+     databases.
+    </para>

+   <listitem>
+    <para>
+     The target instance must have
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     configured to a value greater than or equal to the number of target
+     databases and replication slots.
+    </para>
+   </listitem>

2) Can we add an id to prerequisites and use it instead of referring
to r1-app-pg_createsubscriber-1:
-     <application>pg_createsubscriber</application> checks if the
given target data
-     directory has the same system identifier than the source data directory.
-     Since it uses the recovery process as one of the steps, it starts the
-     target server as a replica from the source server. If the system
-     identifier is not the same,
<application>pg_createsubscriber</application> will
-     terminate with an error.
+     Checks the target can be converted.  In particular, things listed in
+     <link linkend="r1-app-pg_createsubscriber-1">above section</link> would be
+     checked.  If these are not met
<application>pg_createsubscriber</application>
+     will terminate with an error.
     </para>

3) The code also checks the following:
 Verify if a PostgreSQL binary (progname) is available in the same
directory as pg_createsubscriber.

But this is not present in the pre-requisites of documentation.

4) Here we mention that the target server should be stopped, but the
same is not mentioned in prerequisites:
+   Here is an example of using <application>pg_createsubscriber</application>.
+   Before running the command, please make sure target server is stopped.
+<screen>
+<prompt>$</prompt> <userinput>pg_ctl -D /usr/local/pgsql/data stop</userinput>
+</screen>
+

5) If there is an error during any of the pg_createsubscriber
operation like if create subscription fails, it might not be possible
to rollback to the earlier state which had physical-standby
replication. I felt we should document this and also add it to the
console message like how we do in case of pg_upgrade.

Regards,
Vignesh



Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...

Few comments:
1) The below code can lead to assertion failure if the publisher is
stopped while dropping the replication slot:
+       if (primary_slot_name != NULL)
+       {
+               conn = connect_database(dbinfo[0].pubconninfo);
+               if (conn != NULL)
+               {
+                       drop_replication_slot(conn, &dbinfo[0],
primary_slot_name);
+               }
+               else
+               {
+                       pg_log_warning("could not drop replication
slot \"%s\" on primary",
+                                                  primary_slot_name);
+                       pg_log_warning_hint("Drop this replication
slot soon to avoid retention of WAL files.");
+               }
+               disconnect_database(conn);
+       }

pg_createsubscriber: error: connection to database failed: connection
to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
directory
Is the server running locally and accepting connections on that socket?
pg_createsubscriber: warning: could not drop replication slot
"standby_1" on primary
pg_createsubscriber: hint: Drop this replication slot soon to avoid
retention of WAL files.
pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
Assertion `conn != ((void *)0)' failed.
Aborted (core dumped)

This is happening because we are calling disconnect_database in case
of connection failure case too which has the following assert:
+static void
+disconnect_database(PGconn *conn)
+{
+       Assert(conn != NULL);
+
+       PQfinish(conn);
+}

2) There is a CheckDataVersion function which does exactly this, will
we be able to use this:
+       /* Check standby server version */
+       if ((ver_fd = fopen(versionfile, "r")) == NULL)
+               pg_fatal("could not open file \"%s\" for reading: %m",
versionfile);
+
+       /* Version number has to be the first line read */
+       if (!fgets(rawline, sizeof(rawline), ver_fd))
+       {
+               if (!ferror(ver_fd))
+                       pg_fatal("unexpected empty file \"%s\"", versionfile);
+               else
+                       pg_fatal("could not read file \"%s\": %m", versionfile);
+       }
+
+       /* Strip trailing newline and carriage return */
+       (void) pg_strip_crlf(rawline);
+
+       if (strcmp(rawline, PG_MAJORVERSION) != 0)
+       {
+               pg_log_error("standby server is of wrong version");
+               pg_log_error_detail("File \"%s\" contains \"%s\",
which is not compatible with this program's version \"%s\".",
+                                                       versionfile,
rawline, PG_MAJORVERSION);
+               exit(1);
+       }
+
+       fclose(ver_fd);

3) Should this be added to typedefs.list:
+enum WaitPMResult
+{
+       POSTMASTER_READY,
+       POSTMASTER_STILL_STARTING
+};

4) pgCreateSubscriber should be mentioned after pg_controldata to keep
the ordering consistency:
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index aa94f6adf6..c5edd244ef 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -285,6 +285,7 @@
    &pgCtl;
    &pgResetwal;
    &pgRewind;
+   &pgCreateSubscriber;
    &pgtestfsync;

5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
+       if (primary_slot_name)
+       {
+               appendPQExpBuffer(str,
+                                                 "SELECT 1 FROM
pg_replication_slots "
+                                                 "WHERE active AND
slot_name = '%s'",
+                                                 primary_slot_name);

6) Here pg_settings should be pg_catalog.pg_settings:
+        * - max_worker_processes >= 1 + number of dbs to be converted
+        *------------------------------------------------------------------------
+        */
+       res = PQexec(conn,
+                                "SELECT setting FROM pg_settings
WHERE name IN ("
+                                "'max_logical_replication_workers', "
+                                "'max_replication_slots', "
+                                "'max_worker_processes', "
+                                "'primary_slot_name') "
+                                "ORDER BY name");

Regards,
Vignesh



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].

I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
available at 0/3000F10

To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch

2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+   if (max_repslots - cur_repslots < num_dbs)
+   {
+       pg_log_error("publisher requires %d replication slots, but
only %d remain",
+                    num_dbs, max_repslots - cur_repslots);
+       pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+                         cur_repslots + num_dbs);
+       return false;
+   }
Fixed this in v23-0013

v23-0001 to v23-0011 is same as v22-0001 to v22-0011

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].

Few comments on the tests:
1) If the dry run was successful because of some issue then the server
will be stopped so we can check for "pg_ctl status" if the server is
running otherwise the connection will fail in this case. Another way
would be to check if it does not have "postmaster was stopped"
messages in the stdout.
+
+# Check if node S is still a standby
+is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
+       't', 'standby is in recovery');

2) Can we add verification of  "postmaster was stopped" messages in
the stdout for dry run without --databases testcase
+# pg_createsubscriber can run without --databases option
+command_ok(
+       [
+               'pg_createsubscriber', '--verbose',
+               '--dry-run', '--pgdata',
+               $node_s->data_dir, '--publisher-server',
+               $node_p->connstr('pg1'), '--subscriber-server',
+               $node_s->connstr('pg1')
+       ],
+       'run pg_createsubscriber without --databases');
+

3) This message "target server must be running" seems to be wrong,
should it be cannot specify cascading replicating standby as standby
node(this is for v22-0011 patch :
+               'pg_createsubscriber', '--verbose', '--pgdata',
$node_c->data_dir,
+               '--publisher-server', $node_s->connstr('postgres'),
+               '--port', $node_c->port, '--socketdir', $node_c->host,
+               '--database', 'postgres'
        ],
-       'primary server is in recovery');
+       1,
+       [qr//],
+       [qr/primary server cannot be in recovery/],
+       'target server must be running');

4) Should this be "Wait for subscriber to catch up"
+# Wait subscriber to catch up
+$node_s->wait_for_subscription_sync($node_p, $subnames[0]);
+$node_s->wait_for_subscription_sync($node_p, $subnames[1]);

5) Should this be 'Subscriptions has been created on all the specified
databases'
+);
+is($result, qq(2),
+       'Subscriptions has been created to all the specified databases'
+);

6) Add test to verify current_user is not a member of
ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no
permissions to execution replication origin advance functions

7) Add tests to verify insufficient max_logical_replication_workers,
max_replication_slots and max_worker_processes for the subscription
node

8) Add tests to verify invalid configuration of  wal_level,
max_replication_slots and max_wal_senders for the publisher node

9) We can use the same node name in comment and for the variable
+# Set up node P as primary
+$node_p = PostgreSQL::Test::Cluster->new('node_p');
+$node_p->init(allows_streaming => 'logical');
+$node_p->start;

10) Similarly we can use node_f instead of F in the comments.
+# Set up node F as about-to-fail node
+# Force it to initialize a new cluster instead of copying a
+# previously initdb'd cluster.
+{
+       local $ENV{'INITDB_TEMPLATE'} = undef;
+
+       $node_f = PostgreSQL::Test::Cluster->new('node_f');
+       $node_f->init(allows_streaming => 'logical');
+       $node_f->start;

Regards,
Vignesh



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shlok,

> Hi,
> 
> I have reviewed the v21 patch. And found an issue.
> 
> Initially I started the standby server with a new postgresql.conf file
> (not the default postgresql.conf that is present in the instance).
> pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"
> 
> And I have made 'max_replication_slots = 1' in new postgresql.conf and
> made  'max_replication_slots = 0' in the default postgresql.conf file.
> Now when we run pg_createsubscriber on standby we get error:
> pg_createsubscriber: error: could not set replication progress for the
> subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
> manipulate replication origin when max_replication_slots = 0
> NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: error: could not drop replication slot
> "pg_createsubscriber_5_242843" on database "postgres": ERROR:
> replication slot "pg_createsubscriber_5_242843" does not exist
> 
> I observed that when we run the pg_createsubscriber command, it will
> stop the standby instance (the non-default postgres configuration) and
> restart the standby instance which will now be started with default
> postgresql.conf, where the 'max_replication_slot = 0' and
> pg_createsubscriber will now fail with the error given above.
> I have added the script file with which we can reproduce this issue.
> Also similar issues can happen with other configurations such as port, etc.

Possible. So the issue is that GUC settings might be changed after the restart
so that the verification phase may not be enough. There are similar GUCs in [1]
and they may have similar issues. E.g., if "hba_file" is set when the server
started, the file cannot be seen after the restart, so pg_createsubscriber may
not connect to it anymore.

> The possible solution would be
> 1) allow to run pg_createsubscriber if standby is initially stopped .
> I observed that pg_logical_createsubscriber also uses this approach.
> 2) read GUCs via SHOW command and restore them when server restarts
>

I also prefer the first solution. 
Another reason why the standby should be stopped is for backup purpose.
Basically, the standby instance should be saved before running pg_createsubscriber.
An easiest way is hard-copy, and the postmaster should be stopped at that time.
I felt it is better that users can run the command immediately later the copying.
Thought?

[1]: https://www.postgresql.org/docs/current/storage-file-layout.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

Since no one updates, here are new patch set.
Note that comments from Peter E. was not addressed because
Euler seemed to have already fixed.

> 3) Error conditions is verbose mode has invalid error message like
> "out of memory" messages like in below:
> pg_createsubscriber: waiting the postmaster to reach the consistent state
> pg_createsubscriber: postmaster reached the consistent state
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: creating subscription
> "pg_createsubscriber_5_278343" on database "postgres"
> pg_createsubscriber: error: could not create subscription
> "pg_createsubscriber_5_278343" on database "postgres": out of memory

Descriptions were added.

> 4) In error cases we try to drop this publication again resulting in error:
> +               /*
> +                * Since the publication was created before the
> consistent LSN, it is
> +                * available on the subscriber when the physical
> replica is promoted.
> +                * Remove publications from the subscriber because it
> has no use.
> +                */
> +               drop_publication(conn, &dbinfo[i]);
> 
> Which throws these errors(because of drop publication multiple times):
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: dropping the replication slot
> "pg_createsubscriber_5_278343" on database "postgres"

Changed to ... IF EXISTS. I thought your another proposal looked not good
because if the flag was turned off, the publication on publisher node could
not be dropped.

> 5) In error cases, wait_for_end_recovery waits even though it has
> identified that the replication between primary and standby is
> stopped:
> +/*
> + * Is recovery still in progress?
> + * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
> + * while executing the query, it returns -1.
> + */
> +static int
> +server_is_in_recovery(PGconn *conn)
> +{
> +       PGresult   *res;
> +       int                     ret;
> +
> +       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
> +
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               PQclear(res);
> +               pg_log_error("could not obtain recovery progress");
> +               return -1;
> +       }
> +
> 
> You can simulate this by stopping the primary just before
> wait_for_end_recovery and you will see these error messages, but
> pg_createsubscriber will continue to wait:
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress

Based on idea from Euler, I roughly implemented. Thought?

0001-0013 were not changed from the previous version.

V24-0014: addressed your comment in the replied e-mail.
V24-0015: Add disconnect_database() again, per [3]
V24-0016: addressed your comment in [4].
V24-0017: addressed your comment in [5].
V24-0018: addressed your comment in [6].

[1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]: https://www.postgresql.org/message-id/CALDaNm1ocVQmWhUJqxJDmR8N%3DCTbrH5GCdFU72ywnVRV6dND2A%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/202402201053.6jjvdrm7kahf%40alvherre.pgsql
[4]: https://www.postgresql.org/message-id/CALDaNm2qHuZZvh6ym6OM367RfozU7RaaRDSm%3DF8M3SNrcQG2pQ%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CALDaNm3Q5W%3DEvphDjHA1n8ii5fv2DvxVShSmQLNFgeiHsOUwPg%40mail.gmail.com
[6]: https://www.postgresql.org/message-id/CALDaNm1M73ds0GBxX-XZX56f1D%2BGPojeCCwo-DLTVnfu8DMAvw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Alvaro,

> > 15.
> >
> > You said in case of failure, cleanups is not needed if the process exits soon [1].
> > But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?
> 
> Hmm, but doesn't this mean that the server will log an ugly message that
> "client closed connection unexpectedly"?  I think it's nicer to close
> the connection before terminating the process (especially since the
> code for that is already written).

OK. So we should disconnect properly even if the process exits. I added the function call
again. Note that PQclear() was not added because it is only related with the application.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

> Few comments regarding the documentation:
> 1) max_replication_slots information seems to be present couple of times:
> 
> +    <para>
> +     The target instance must have
> +     <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> +     and <link
> linkend="guc-max-logical-replication-workers"><varname>max_logical_replica
> tion_workers</varname></link>
> +     configured to a value greater than or equal to the number of target
> +     databases.
> +    </para>
>
> +   <listitem>
> +    <para>
> +     The target instance must have
> +     <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> +     configured to a value greater than or equal to the number of target
> +     databases and replication slots.
> +    </para>
> +   </listitem>

Fixed.

> 2) Can we add an id to prerequisites and use it instead of referring
> to r1-app-pg_createsubscriber-1:
> -     <application>pg_createsubscriber</application> checks if the
> given target data
> -     directory has the same system identifier than the source data directory.
> -     Since it uses the recovery process as one of the steps, it starts the
> -     target server as a replica from the source server. If the system
> -     identifier is not the same,
> <application>pg_createsubscriber</application> will
> -     terminate with an error.
> +     Checks the target can be converted.  In particular, things listed in
> +     <link linkend="r1-app-pg_createsubscriber-1">above section</link>
> would be
> +     checked.  If these are not met
> <application>pg_createsubscriber</application>
> +     will terminate with an error.
>      </para>

Changed.

> 3) The code also checks the following:
>  Verify if a PostgreSQL binary (progname) is available in the same
> directory as pg_createsubscriber.
> 
> But this is not present in the pre-requisites of documentation.

I think it is quite trivial so that I did not add.

> 4) Here we mention that the target server should be stopped, but the
> same is not mentioned in prerequisites:
> +   Here is an example of using
> <application>pg_createsubscriber</application>.
> +   Before running the command, please make sure target server is stopped.
> +<screen>
> +<prompt>$</prompt> <userinput>pg_ctl -D /usr/local/pgsql/data
> stop</userinput>
> +</screen>
> +

Oh, it is opposite, it should NOT be stopped. Fixed.

> 5) If there is an error during any of the pg_createsubscriber
> operation like if create subscription fails, it might not be possible
> to rollback to the earlier state which had physical-standby
> replication. I felt we should document this and also add it to the
> console message like how we do in case of pg_upgrade.

Added.

New version can be available in [1]

[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077CD333376B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

> Few comments:
> 1) The below code can lead to assertion failure if the publisher is
> stopped while dropping the replication slot:
> +       if (primary_slot_name != NULL)
> +       {
> +               conn = connect_database(dbinfo[0].pubconninfo);
> +               if (conn != NULL)
> +               {
> +                       drop_replication_slot(conn, &dbinfo[0],
> primary_slot_name);
> +               }
> +               else
> +               {
> +                       pg_log_warning("could not drop replication
> slot \"%s\" on primary",
> +                                                  primary_slot_name);
> +                       pg_log_warning_hint("Drop this replication
> slot soon to avoid retention of WAL files.");
> +               }
> +               disconnect_database(conn);
> +       }
> 
> pg_createsubscriber: error: connection to database failed: connection
> to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
> directory
> Is the server running locally and accepting connections on that socket?
> pg_createsubscriber: warning: could not drop replication slot
> "standby_1" on primary
> pg_createsubscriber: hint: Drop this replication slot soon to avoid
> retention of WAL files.
> pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
> Assertion `conn != ((void *)0)' failed.
> Aborted (core dumped)
> 
> This is happening because we are calling disconnect_database in case
> of connection failure case too which has the following assert:
> +static void
> +disconnect_database(PGconn *conn)
> +{
> +       Assert(conn != NULL);
> +
> +       PQfinish(conn);
> +}

Right. disconnect_database() was moved to if (conn != NULL) block.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +       /* Check standby server version */
> +       if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +               pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +       /* Version number has to be the first line read */
> +       if (!fgets(rawline, sizeof(rawline), ver_fd))
> +       {
> +               if (!ferror(ver_fd))
> +                       pg_fatal("unexpected empty file \"%s\"", versionfile);
> +               else
> +                       pg_fatal("could not read file \"%s\": %m", versionfile);
> +       }
> +
> +       /* Strip trailing newline and carriage return */
> +       (void) pg_strip_crlf(rawline);
> +
> +       if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +       {
> +               pg_log_error("standby server is of wrong version");
> +               pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +                                                       versionfile,
> rawline, PG_MAJORVERSION);
> +               exit(1);
> +       }
> +
> +       fclose(ver_fd);


> 3) Should this be added to typedefs.list:
> +enum WaitPMResult
> +{
> +       POSTMASTER_READY,
> +       POSTMASTER_STILL_STARTING
> +};

But the comment from Peter E. [1] was opposite. I did not handle this.

> 4) pgCreateSubscriber should be mentioned after pg_controldata to keep
> the ordering consistency:
> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
> index aa94f6adf6..c5edd244ef 100644
> --- a/doc/src/sgml/reference.sgml
> +++ b/doc/src/sgml/reference.sgml
> @@ -285,6 +285,7 @@
>     &pgCtl;
>     &pgResetwal;
>     &pgRewind;
> +   &pgCreateSubscriber;
>     &pgtestfsync;

This has been already pointed out by Peter E. I did not handle this.

> 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
> +       if (primary_slot_name)
> +       {
> +               appendPQExpBuffer(str,
> +                                                 "SELECT 1 FROM
> pg_replication_slots "
> +                                                 "WHERE active AND
> slot_name = '%s'",
> +                                                 primary_slot_name);

Fixed.

> 6) Here pg_settings should be pg_catalog.pg_settings:
> +        * - max_worker_processes >= 1 + number of dbs to be converted
> +
> *------------------------------------------------------------------------
> +        */
> +       res = PQexec(conn,
> +                                "SELECT setting FROM pg_settings
> WHERE name IN ("
> +                                "'max_logical_replication_workers', "
> +                                "'max_replication_slots', "
> +                                "'max_worker_processes', "
> +                                "'primary_slot_name') "
> +                                "ORDER BY name");

Fixed.

New version can be available in [2]

[1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077CD333376B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

> Few comments on the tests:
> 1) If the dry run was successful because of some issue then the server
> will be stopped so we can check for "pg_ctl status" if the server is
> running otherwise the connection will fail in this case. Another way
> would be to check if it does not have "postmaster was stopped"
> messages in the stdout.
> +
> +# Check if node S is still a standby
> +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
> +       't', 'standby is in recovery');

Just to confirm - your suggestion is to add `pg_ctl status`, right? Added.

> 2) Can we add verification of  "postmaster was stopped" messages in
> the stdout for dry run without --databases testcase
> +# pg_createsubscriber can run without --databases option
> +command_ok(
> +       [
> +               'pg_createsubscriber', '--verbose',
> +               '--dry-run', '--pgdata',
> +               $node_s->data_dir, '--publisher-server',
> +               $node_p->connstr('pg1'), '--subscriber-server',
> +               $node_s->connstr('pg1')
> +       ],
> +       'run pg_createsubscriber without --databases');
> +

Hmm, in case of --dry-run, the server would be never shut down.
See below part.

```
    if (!dry_run)
        stop_standby_server(pg_ctl_path, opt.subscriber_dir);
```

> 3) This message "target server must be running" seems to be wrong,
> should it be cannot specify cascading replicating standby as standby
> node(this is for v22-0011 patch :
> +               'pg_createsubscriber', '--verbose', '--pgdata',
> $node_c->data_dir,
> +               '--publisher-server', $node_s->connstr('postgres'),
> +               '--port', $node_c->port, '--socketdir', $node_c->host,
> +               '--database', 'postgres'
>         ],
> -       'primary server is in recovery');
> +       1,
> +       [qr//],
> +       [qr/primary server cannot be in recovery/],
> +       'target server must be running');

Fixed.

> 4) Should this be "Wait for subscriber to catch up"
> +# Wait subscriber to catch up
> +$node_s->wait_for_subscription_sync($node_p, $subnames[0]);
> +$node_s->wait_for_subscription_sync($node_p, $subnames[1]);

Fixed.

> 5) Should this be 'Subscriptions has been created on all the specified
> databases'
> +);
> +is($result, qq(2),
> +       'Subscriptions has been created to all the specified databases'
> +);

Fixed, but "has" should be "have".

> 6) Add test to verify current_user is not a member of
> ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no
> permissions to execution replication origin advance functions
> 
> 7) Add tests to verify insufficient max_logical_replication_workers,
> max_replication_slots and max_worker_processes for the subscription
> node
> 
> 8) Add tests to verify invalid configuration of  wal_level,
> max_replication_slots and max_wal_senders for the publisher node

Hmm, but adding these checks may increase the test time. we should
measure the time and then decide.

> 9) We can use the same node name in comment and for the variable
> +# Set up node P as primary
> +$node_p = PostgreSQL::Test::Cluster->new('node_p');
> +$node_p->init(allows_streaming => 'logical');
> +$node_p->start;

Fixed.

> 10) Similarly we can use node_f instead of F in the comments.
> +# Set up node F as about-to-fail node
> +# Force it to initialize a new cluster instead of copying a
> +# previously initdb'd cluster.
> +{
> +       local $ENV{'INITDB_TEMPLATE'} = undef;
> +
> +       $node_f = PostgreSQL::Test::Cluster->new('node_f');
> +       $node_f->init(allows_streaming => 'logical');
> +       $node_f->start;
>

Fixed. Also, recent commit [1] allows to run the initdb forcibly. So followed.

New patch can be available in [2].

[1]: https://github.com/postgres/postgres/commit/ff9e1e764fcce9a34467d614611a34d4d2a91b50
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077CD333376B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
'Alvaro Herrera'
Date:
Hello,

On 2024-Feb-22, Hayato Kuroda (Fujitsu) wrote:

> Dear Alvaro,

> > Hmm, but doesn't this mean that the server will log an ugly message
> > that "client closed connection unexpectedly"?  I think it's nicer to
> > close the connection before terminating the process (especially
> > since the code for that is already written).
> 
> OK. So we should disconnect properly even if the process exits. I
> added the function call again. Note that PQclear() was not added
> because it is only related with the application.

Sounds about right, but I didn't verify the patches in detail.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)



Re: speed up a logical replica setup

From
vignesh C
Date:
On Thu, 22 Feb 2024 at 21:17, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> > Few comments on the tests:
> > 1) If the dry run was successful because of some issue then the server
> > will be stopped so we can check for "pg_ctl status" if the server is
> > running otherwise the connection will fail in this case. Another way
> > would be to check if it does not have "postmaster was stopped"
> > messages in the stdout.
> > +
> > +# Check if node S is still a standby
> > +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
> > +       't', 'standby is in recovery');
>
> Just to confirm - your suggestion is to add `pg_ctl status`, right? Added.

Yes, that is correct.

> > 2) Can we add verification of  "postmaster was stopped" messages in
> > the stdout for dry run without --databases testcase
> > +# pg_createsubscriber can run without --databases option
> > +command_ok(
> > +       [
> > +               'pg_createsubscriber', '--verbose',
> > +               '--dry-run', '--pgdata',
> > +               $node_s->data_dir, '--publisher-server',
> > +               $node_p->connstr('pg1'), '--subscriber-server',
> > +               $node_s->connstr('pg1')
> > +       ],
> > +       'run pg_createsubscriber without --databases');
> > +
>
> Hmm, in case of --dry-run, the server would be never shut down.
> See below part.
>
> ```
>         if (!dry_run)
>                 stop_standby_server(pg_ctl_path, opt.subscriber_dir);
> ```

One way to differentiate whether the server is run in dry_run mode or
not is to check if the server was stopped or not. So I mean we can
check that the stdout does not have a "postmaster was stopped" message
from the stdout. Can we add validation based on this code:
+       if (action)
+               pg_log_info("postmaster was started");

Or another way is to check pg_ctl status to see that the server is not shutdown.

> > 3) This message "target server must be running" seems to be wrong,
> > should it be cannot specify cascading replicating standby as standby
> > node(this is for v22-0011 patch :
> > +               'pg_createsubscriber', '--verbose', '--pgdata',
> > $node_c->data_dir,
> > +               '--publisher-server', $node_s->connstr('postgres'),
> > +               '--port', $node_c->port, '--socketdir', $node_c->host,
> > +               '--database', 'postgres'
> >         ],
> > -       'primary server is in recovery');
> > +       1,
> > +       [qr//],
> > +       [qr/primary server cannot be in recovery/],
> > +       'target server must be running');
>
> Fixed.
>
> > 4) Should this be "Wait for subscriber to catch up"
> > +# Wait subscriber to catch up
> > +$node_s->wait_for_subscription_sync($node_p, $subnames[0]);
> > +$node_s->wait_for_subscription_sync($node_p, $subnames[1]);
>
> Fixed.
>
> > 5) Should this be 'Subscriptions has been created on all the specified
> > databases'
> > +);
> > +is($result, qq(2),
> > +       'Subscriptions has been created to all the specified databases'
> > +);
>
> Fixed, but "has" should be "have".
>
> > 6) Add test to verify current_user is not a member of
> > ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no
> > permissions to execution replication origin advance functions
> >
> > 7) Add tests to verify insufficient max_logical_replication_workers,
> > max_replication_slots and max_worker_processes for the subscription
> > node
> >
> > 8) Add tests to verify invalid configuration of  wal_level,
> > max_replication_slots and max_wal_senders for the publisher node
>
> Hmm, but adding these checks may increase the test time. we should
> measure the time and then decide.

We can check and see if it does not take significantly more time, then
we can have these tests.

Regards,
Vignesh



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Feb 22, 2024, at 9:43 AM, Hayato Kuroda (Fujitsu) wrote:
> The possible solution would be
> 1) allow to run pg_createsubscriber if standby is initially stopped .
> I observed that pg_logical_createsubscriber also uses this approach.
> 2) read GUCs via SHOW command and restore them when server restarts
>

3. add a config-file option. That's similar to what pg_rewind does. I expect
that Debian-based installations will have this issue.

I also prefer the first solution. 
Another reason why the standby should be stopped is for backup purpose.
Basically, the standby instance should be saved before running pg_createsubscriber.
An easiest way is hard-copy, and the postmaster should be stopped at that time.
I felt it is better that users can run the command immediately later the copying.
Thought?

It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
I mean you need extra steps that can be error prone (different settings between
files).

Shlok, I didn't read your previous email carefully. :-/


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
available at 0/3000F10

To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message.  I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.

2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+   if (max_repslots - cur_repslots < num_dbs)
+   {
+       pg_log_error("publisher requires %d replication slots, but
only %d remain",
+                    num_dbs, max_repslots - cur_repslots);
+       pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+                         cur_repslots + num_dbs);
+       return false;
+   }
Fixed this in v23-0013

Good catch!

Both are included in the next patch.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Feb 23, 2024 at 8:16 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
>
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
> available at 0/3000F10
>
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch
>
>
> IIRC we already discussed the cascading replication scenario. Of course,
> breaking a node is not good that's why you proposed v23-0012. However,
> preventing pg_createsubscriber to run if there are standbys attached to it is
> also annoying. If you don't access to these hosts you need to (a) kill
> walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
> add a firewall rule to prevent that these hosts do not establish a connection
> to the target server. I wouldn't like to include the patch as-is. IMO we need
> at least one message explaining the situation to the user, I mean, add a hint
> message.
>

Yeah, it could be a bit tricky for users to ensure that no follow-on
standby is present but I think it is still better to give an error and
prohibit running pg_createsubscriber than breaking the existing
replication. The possible solution, in this case, is to allow running
pg_basebackup via this tool or otherwise and then let the user use it
to convert to a subscriber. It would be good to keep things simple for
the first version then we can add such options like --force in
subsequent versions.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

I forgot to reply one of the suggestion.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +       /* Check standby server version */
> +       if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +               pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +       /* Version number has to be the first line read */
> +       if (!fgets(rawline, sizeof(rawline), ver_fd))
> +       {
> +               if (!ferror(ver_fd))
> +                       pg_fatal("unexpected empty file \"%s\"", versionfile);
> +               else
> +                       pg_fatal("could not read file \"%s\": %m", versionfile);
> +       }
> +
> +       /* Strip trailing newline and carriage return */
> +       (void) pg_strip_crlf(rawline);
> +
> +       if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +       {
> +               pg_log_error("standby server is of wrong version");
> +               pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +                                                       versionfile,
> rawline, PG_MAJORVERSION);
> +               exit(1);
> +       }
> +
> +       fclose(ver_fd);


This suggestion has been rejected because I was not sure the location of the
declaration and the implementation. Function which could be called from clients
must be declared in src/include/{common|fe_utils|utils} directory. I saw files
located at there but I could not appropriate location for CheckDataVersion.
Also, I did not think new file should be created only for this function.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

>> The possible solution would be
>> 1) allow to run pg_createsubscriber if standby is initially stopped .
>> I observed that pg_logical_createsubscriber also uses this approach.
>> 2) read GUCs via SHOW command and restore them when server restarts
>>
>3. add a config-file option. That's similar to what pg_rewind does.

Sorry, which pg_rewind option did you mention? I cannot find.
IIUC, -l is an only option which can accept the path, but it is not related with us.

Also, I'm not sure the benefit to add as new options. Basically it should be less.
Is there benefits than read via SHOW? Even if I assume the pg_resetwal has such
an option, the reason is that the target postmaster for pg_resetwal must be stopped.

>I expect
>that Debian-based installations will have this issue.

I'm not familiar with the Debian-env, so can you explain the reason?

>It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
>I mean you need extra steps that can be error prone (different settings between
>files).

Yeah, if we use my approach, users who specify such GUCs may not be happy.
So...based on above discussion, we should choose either of below items. Thought?

a)
enforce the standby must be *stopped*, and options like config_file can be specified via option.
b)
enforce the standby must be *running*,  options like config_file would be read via SHOW command.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

> Sorry, which pg_rewind option did you mention? I cannot find.
> IIUC, -l is an only option which can accept the path, but it is not related with us.

Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed.
The reason why --config-file exists is that pg_rewind requires that target must be stopped,
which is different from the current pg_createsubscriber. So I still do not like to add options.

[1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
[2]:
```
The target server must be shut down cleanly before running pg_rewind
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
Andrew Dunstan
Date:
On 2024-02-27 Tu 05:02, Hayato Kuroda (Fujitsu) wrote:
> Dear Euler,
>
>> Sorry, which pg_rewind option did you mention? I cannot find.
>> IIUC, -l is an only option which can accept the path, but it is not related with us.
> Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed.
> The reason why --config-file exists is that pg_rewind requires that target must be stopped,
> which is different from the current pg_createsubscriber. So I still do not like to add options.
>
> [1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
> [2]:
> ```
> The target server must be shut down cleanly before running pg_rewind
> ```
>

Even though that is a difference I'd still rather we did more or less 
the same thing more or less the same way across utilities, so I agree 
with Euler's suggestion.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
Based on idea from Euler, I roughly implemented. Thought?

0001-0013 were not changed from the previous version.

V24-0014: addressed your comment in the replied e-mail.
V24-0015: Add disconnect_database() again, per [3]
V24-0016: addressed your comment in [4].
V24-0017: addressed your comment in [5].
V24-0018: addressed your comment in [6].

Thanks for your review. I'm attaching v25 that hopefully addresses all pending
points.

Regarding your comments [1] on v21, I included changes for almost all items
except 2, 20, 23, 24, and 25. It still think item 2 is not required because
pg_ctl will provide a suitable message. I decided not to rearrange the block of
SQL commands (item 20) mainly because it would avoid these objects on node_f.
Do we really need command_checks_all? Depending on the output it uses
additional cycles than command_ok.

In summary:

v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
checked what you wrote and fix some gaps in what I've been written.
v24-0003: as I said I don't think we need to add it, however, I won't fight
against it if people want to add this check.
v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
and include the start_standby_server code. It starts the server using the
specified socket directory, port and username, hence, preventing external client
connections during the execution.
v24-0005: partially applied
v24-0006: applied with cosmetic change
v24-0007: applied with cosmetic change
v24-0008: applied
v24-0009: applied with cosmetic change
v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
not used when there is a database connection open. Instead, pg_log_error()
followed by disconnect_database(). In cases that it should exit immediately,
disconnect_database() has a new parameter (exit_on_error) that controls if it
needs to call exit(1). I go ahead and did the same for connect_database().
v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
v24-0012: not applied. Under reflection, after working on v24-0004, the target
server will start with new parameters (that only accepts local connections),
hence, during the execution is not possible anymore to detect if the target
server is a primary for another server. I added a sentence for it in the
documentation (Warning section).
v24-0013: good catch. Applied.
v24-0014: partially applied. After some experiments I decided to use a small
number of attempts. The current code didn't reset the counter if the connection
is reestablished. I included the documentation suggestion. I didn't include the
IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
I refactored the drop_publication() to not try again if the DROP PUBLICATION
failed.
v24-0015: not applied. I refactored the exit code to do the right thing: print
error message, disconnect database (if applicable) and exit.
v24-0016: not applied. But checked if the information was presented in the
documentation; it is.
v24-0017: good catch. Applied.
v24-0018: not applied.

I removed almost all boolean return and include the error logic inside the
function. It reads better. I also changed the connect|disconnect_database
functions to include the error logic inside it. There is a new parameter
on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
think someone suggested it -- and the error message was moved to outside the
function. I improved the cleanup routine. It provides useful information if it
cannot remove the object (publication or replication slot) from the primary.

Since I applied v24-0004, I realized that extra start / stop service are
required. It mean pg_createsubscriber doesn't start the transformation with the
current standby settings. Instead, it stops the standby if it is running and
start it with the provided command-line options (socket, port,
listen_addresses). It has a few drawbacks:
* See v34-0012. It cannot detect if the target server is a primary for another
  server. It is documented.
* I also removed the check for standby is running. If the standby was stopped a
  long time ago, it will take some time to reach the start point.
* Dry run mode has to start / stop the service to work correctly. Is it an
  issue?

However, I decided to include --retain option, I'm thinking about to remove it.
If the logging is enabled, the information during the pg_createsubscriber will
be available. The client log can be redirected to a file for future inspection.

Comments?




--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Sat, Mar 2, 2024 at 2:19 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the connection
> is reestablished. I included the documentation suggestion. I didn't include the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future inspection.
>
> Comments?

I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following error:

$ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
dbname=postgres"  -d postgres -d db1 -p 9000 -r
./pg_createsubscriber: invalid option -- 'p'
pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
information.

Here, the --p is not accepting the desired port number. Thoughts?

Thanks and Regards,
Shubham Khanna.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,
> I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
> I was unable to execute it and experienced the following error:
>
> $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
> dbname=postgres"  -d postgres -d db1 -p 9000 -r
> ./pg_createsubscriber: invalid option -- 'p'
> pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
> information.
>
> Here, the --p is not accepting the desired port number. Thoughts?

I investigated it and found that:

+ while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
+             long_options, &option_index)) != -1)
+ {

Here 'p', 's' and 'U' options are missing so we are getting the error.
Also, I think the 'S' option should be removed from here.

I also see that specifying long options like --subscriber-port,
--subscriber-username, --socket-directory works fine.
Thoughts?

Thanks and regards,
Shlok Kyal



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Mar 5, 2024, at 12:48 AM, Shubham Khanna wrote:
I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following error:

$ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
dbname=postgres"  -d postgres -d db1 -p 9000 -r
./pg_createsubscriber: invalid option -- 'p'
pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
information.

Oops. Good catch! I will post an updated patch soon.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch!

>v24-0003: as I said I don't think we need to add it, however, I won't fight
>against it if people want to add this check.

OK, let's wait comments from senior members.

>Since I applied v24-0004, I realized that extra start / stop service are
>required. It mean pg_createsubscriber doesn't start the transformation with the
>current standby settings. Instead, it stops the standby if it is running and
>start it with the provided command-line options (socket, port,
>listen_addresses). It has a few drawbacks:
>* See v34-0012. It cannot detect if the target server is a primary for another
>  server. It is documented.

Yeah, It is a collateral damage.

>* I also removed the check for standby is running. If the standby was stopped a
>  long time ago, it will take some time to reach the start point.
>* Dry run mode has to start / stop the service to work correctly. Is it an
>  issue?

One concern (see below comment) is that -l option would not be passed even if
the standby has been logging before running pg_createsubscriber. Also, some settings
passed by pg_ctl start -o .... would not be restored.

>However, I decided to include --retain option, I'm thinking about to remove it.
>If the logging is enabled, the information during the pg_createsubscriber will
>be available. The client log can be redirected to a file for future inspection.

Just to confirm - you meant to say like below, right? 
* the client output would be redirected, and
* -r option would be removed.

Here are my initial comments for v25-0001. I read new doc and looks very good.
I may do reviewing more about v25-0001, but feel free to revise.

01. cleanup_objects_atexit
```
    PGconn       *conn;
    int            i;
```
The declaration *conn can be in the for-loop. Also, the declaration of the indicator can be in the bracket.

02. cleanup_objects_atexit
```

                /*
                 * If a connection could not be established, inform the user
                 * that some objects were left on primary and should be
                 * removed before trying again.
                 */
                if (dbinfo[i].made_publication)
                {
                    pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary",
                                   dbinfo[i].pubname, dbinfo[i].dbname);
                    pg_log_warning_hint("Consider dropping this publication before trying again.");
                }
                if (dbinfo[i].made_replslot)
                {
                    pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" on primary",
                                   dbinfo[i].subname, dbinfo[i].dbname);
                    pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files.");
                }
```

Not sure which is better, but we may able to the list to the concrete file like pg_upgrade.
(I thought it had been already discussed, but could not find from the archive. Sorry if it was a duplicated comment)

03. main
```
    while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
                            long_options, &option_index)) != -1)
```

Missing update for __shortopts.

04. main
```
            case 'D':
                opt.subscriber_dir = pg_strdup(optarg);
                canonicalize_path(opt.subscriber_dir);
                break;
...
            case 'P':
                opt.pub_conninfo_str = pg_strdup(optarg);
                break;
...
            case 's':
                opt.socket_dir = pg_strdup(optarg);
                break;
...
            case 'U':
                opt.sub_username = pg_strdup(optarg);
                break;
```

Should we consider the case these options would be specified twice?
I.e., should we call pg_free() before the substitution?
 
05. main

Missing canonicalize_path() to the socket_dir.

06. main
```
    /*
     * If socket directory is not provided, use the current directory.
     */
```

One-line comment can be used. Period can be also removed at that time.

07. main
```
    /*
     *
     * If subscriber username is not provided, check if the environment
     * variable sets it. If not, obtain the operating system name of the user
     * running it.
     */
```
Unnecessary blank.

08. main
```
        char       *errstr = NULL;
```
 
This declaration can be at else-part.

09. main.

Also, as the first place, do we have to get username if not specified?
I felt libpq can handle the case if we skip passing the info.

10. main
```
    appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s fallback_application_name=%s",
                      opt.socket_dir, opt.sub_port, opt.sub_username, progname);
    sub_base_conninfo = get_base_conninfo(sub_conninfo_str->data, NULL);
```

Is it really needed to call get_base_conninfo? I think no need to define
sub_base_conninfo.

11. main

```
    /*
     * In dry run mode, the server is restarted with the provided command-line
     * options so validation can be applied in the target server. In order to
     * preserve the initial state of the server (running), start it without
     * the command-line options.
     */
    if (dry_run)
        start_standby_server(&opt, pg_ctl_path, NULL, false);
```

I think initial state of the server may be stopped. Now both conditions are allowed.
And I think it is not good not to specify the logfile.

12. others

As Peter E pointed out [1], the main function is still huge. It has more than 400 lines.
I think all functions should have less than 100 line to keep the readability.

I considered separation idea like below. Note that this may require to change
orderings. How do you think?

* add parse_command_options() which accepts user options and verifies them
* add verification_phase() or something which checks system identifier and calls check_XXX
* add catchup_phase() or something which creates a temporary slot, writes recovery parameters,
  and wait until the end of recovery
* add cleanup_phase() or something which removes primary_slot and modifies the
  system identifier
* stop/start server can be combined into one wrapper.

Attached txt file is proofs the concept.

13. others

PQresultStatus(res) is called 17 times in this source code, it may be redundant.
I think we can introduce a function like executeQueryOrDie() and gather in one place.

14. others

I found that pg_createsubscriber does not refer functions declared in other files.
Is there a possibility to use them, e.g., streamutils.h?

15. others 

While reading the old discussions [2], Amit suggested to keep the comment and avoid
creating a temporary slot. You said "Got it" but temp slot still exists.
Is there any reason? Can you clarify your opinion?

16. others

While reading [2] and [3], I was confused the decision. You and Amit discussed
the combination with pg_createsubscriber and slot sync and how should handle
slots on the physical standby. You seemed to agree to remove such a slot, and
Amit also suggested to raise an ERROR. However, you said in [8] that such
handlings is not mandatory so should raise an WARNING in dry_run. I was quite confused.
Am I missing something?

17. others

Per discussion around [4], we might have to consider an if the some options like
data_directory and config_file was initially specified for standby server. Another
easy approach is to allow users to specify options like -o in pg_upgrade [5],
which is similar to your idea. Thought?


18. others

How do you handle the reported failure [6]?

19. main
```
    char       *pub_base_conninfo = NULL;
    char       *sub_base_conninfo = NULL;
    char       *dbname_conninfo = NULL;
```

No need to initialize pub_base_conninfo and sub_base_conninfo.
These variables would not be free'd.

20. others

IIUC, slot creations would not be finished if there are prepared transactions.
Should we detect it on the verification phase and raise an ERROR?

21. others

As I said in [7], the catch up would not be finished if long recovery_min_apply_delay
is used. Should we overwrite during the catch up?

22. pg_createsubscriber.sgml
```
    <para>
     Check
     Write recovery parameters into the target data...
```

Not sure, but "Check" seems not needed.

[1]: https://www.postgresql.org/message-id/b9aa614c-84ba-a869-582f-8d5e3ab57424%40enterprisedb.com
[2]: https://www.postgresql.org/message-id/9fd3018d-0e5f-4507-aee6-efabfb5a4440%40app.fastmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1L%2BE-bdKaOMSw-yWizcuprKMyeejyOwWjq_57%3DUqh-f%2Bg%40mail.gmail.com
[4]:
https://www.postgresql.org/message-id/TYCPR01MB12077B63D81B49E9DFD323661F55A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[5]:
https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=options%20to%20be%20passed%20directly%20to%20the%20old%20postgres%20command%3B%20multiple%20option%20invocations%20are%20appended
[6]: https://www.postgresql.org/message-id/CAHv8Rj%2B5mzK9Jt%2B7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g%40mail.gmail.com
[7]:
https://www.postgresql.org/message-id/OS3PR01MB98828B15DD9502C91E0C50D7F57D2%40OS3PR01MB9882.jpnprd01.prod.outlook.com
[8]: https://www.postgresql.org/message-id/be92c57b-82e1-4920-ac31-a8a04206db7b%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Sat, 2 Mar 2024 at 02:19, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the connection
> is reestablished. I included the documentation suggestion. I didn't include the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future inspection.
>
> Comments?

Few comments:
1)   Can we use strdup here instead of atoi, as we do similarly in
case of pg_dump too, else we will do double conversion, convert using
atoi and again to string while forming the connection string:
+                       case 'p':
+                               if ((opt.sub_port = atoi(optarg)) <= 0)
+                                       pg_fatal("invalid subscriber
port number");
+                               break;

2) We can have some valid range for this, else we will end up in some
unexpected values when a higher number is specified:
+                       case 't':
+                               opt.recovery_timeout = atoi(optarg);
+                               break;

3) Now that we have addressed most of the items, can we handle this TODO:
+               /*
+                * TODO use primary_conninfo (if available) from subscriber and
+                * extract publisher connection string. Assume that there are
+                * identical entries for physical and logical
replication. If there is
+                * not, we would fail anyway.
+                */
+               pg_log_error("no publisher connection string specified");
+               pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+               exit(1);

4)  By default the log level as info here, I was not sure how to set
it to debug level to get these error messages:
+               pg_log_debug("publisher(%d): connection string: %s",
i, dbinfo[i].pubconninfo);
+               pg_log_debug("subscriber(%d): connection string: %s",
i, dbinfo[i].subconninfo);

5) Currently in non verbose mode there are no messages printed on
console, we could have a few of them printed irrespective of verbose
or not like the following:
a) creating publication
b) creating replication slot
c) waiting for the target server to reach the consistent state
d) If pg_createsubscriber fails after this point, you must recreate
the physical replica before continuing.
e) creating subscription

6) The message should be "waiting for the target server to reach the
consistent state":
+#define NUM_CONN_ATTEMPTS      5
+
+       pg_log_info("waiting the target server to reach the consistent state");
+
+       conn = connect_database(conninfo, true);

Regards,
Vignesh



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:
Thanks for updating the patch!

Thanks for the feedback. I'm attaching v26 that addresses most of your comments
and some issues pointed by Vignesh [1].

>* I also removed the check for standby is running. If the standby was stopped a
>  long time ago, it will take some time to reach the start point.
>* Dry run mode has to start / stop the service to work correctly. Is it an
>  issue?

One concern (see below comment) is that -l option would not be passed even if
the standby has been logging before running pg_createsubscriber. Also, some settings
passed by pg_ctl start -o .... would not be restored.

That's a good point. We should state in the documentation that GUCs specified in
the command-line options are ignored during the execution.

>However, I decided to include --retain option, I'm thinking about to remove it.
>If the logging is enabled, the information during the pg_createsubscriber will
>be available. The client log can be redirected to a file for future inspection.

Just to confirm - you meant to say like below, right? 
* the client output would be redirected, and
* -r option would be removed.

Yes. The logging_collector is usually enabled or the syslog is collecting the
log entries. Under reflection, another log directory to store entries for a
short period of time doesn't seem a good idea. It divides the information and
it also costs development time. The questions that make me think about it were:
Should I remove the pg_createsubscriber_output.d directory if it runs
successfully? What if there is an old file there? Is it another directory to
exclude while taking a backup? I also don't like the long directory name.

Here are my initial comments for v25-0001. I read new doc and looks very good.
I may do reviewing more about v25-0001, but feel free to revise.

01. cleanup_objects_atexit
```
PGconn    *conn;
int i;
```
The declaration *conn can be in the for-loop. Also, the declaration of the indicator can be in the bracket.

Changed.

02. cleanup_objects_atexit
```

/*
* If a connection could not be established, inform the user
* that some objects were left on primary and should be
* removed before trying again.
*/
if (dbinfo[i].made_publication)
{
pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary",
   dbinfo[i].pubname, dbinfo[i].dbname);
pg_log_warning_hint("Consider dropping this publication before trying again.");
}
if (dbinfo[i].made_replslot)
{
pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" on primary",
   dbinfo[i].subname, dbinfo[i].dbname);
pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files.");
}
```

Not sure which is better, but we may able to the list to the concrete file like pg_upgrade.
(I thought it had been already discussed, but could not find from the archive. Sorry if it was a duplicated comment)

Do you mean the replication slot file? I think the replication slot and the
server (primary) is sufficient for checking and fixing if required.

03. main
```
while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
long_options, &option_index)) != -1)
```

Missing update for __shortopts.

Fixed.

04. main
```
case 'D':
opt.subscriber_dir = pg_strdup(optarg);
canonicalize_path(opt.subscriber_dir);
break;
...
case 'P':
opt.pub_conninfo_str = pg_strdup(optarg);
break;
...
case 's':
opt.socket_dir = pg_strdup(optarg);
break;
...
case 'U':
opt.sub_username = pg_strdup(optarg);
break;
```

Should we consider the case these options would be specified twice?
I.e., should we call pg_free() before the substitution? 

It isn't a concern for the other client tools. I think the reason is that it
doesn't continue to leak memory during the execution. I wouldn't bother with it.

05. main

Missing canonicalize_path() to the socket_dir.

Fixed.

06. main
```
/*
* If socket directory is not provided, use the current directory.
*/
```

One-line comment can be used. Period can be also removed at that time.

Fixed.

07. main
```
/*
*
* If subscriber username is not provided, check if the environment
* variable sets it. If not, obtain the operating system name of the user
* running it.
*/
```
Unnecessary blank.

Fixed.

08. main
```
char    *errstr = NULL;
```
 
This declaration can be at else-part.

Fixed.

09. main.

Also, as the first place, do we have to get username if not specified?
I felt libpq can handle the case if we skip passing the info.

Are you suggesting that the username should be optional?

10. main
```
appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s fallback_application_name=%s",
  opt.socket_dir, opt.sub_port, opt.sub_username, progname);
sub_base_conninfo = get_base_conninfo(sub_conninfo_str->data, NULL);
```

Is it really needed to call get_base_conninfo? I think no need to define
sub_base_conninfo.

No. Good catch. I removed it.

11. main

```
/*
* In dry run mode, the server is restarted with the provided command-line
* options so validation can be applied in the target server. In order to
* preserve the initial state of the server (running), start it without
* the command-line options.
*/
if (dry_run)
start_standby_server(&opt, pg_ctl_path, NULL, false);
```

I think initial state of the server may be stopped. Now both conditions are allowed.
And I think it is not good not to specify the logfile.

Indeed, it is. I don't consider the following test as the first step because it
is conditional. The stop server is a precondition to start. The first step is
the start standby server so the initial state is stopped.

    /*   
     * If the standby server is running, stop it. Some parameters (that can
     * only be set at server start) are informed by command-line options.
     */
    if (stat(pidfile, &statbuf) == 0)
    {    

        pg_log_info("standby is up and running");
        pg_log_info("stopping the server to start the transformation steps");
        stop_standby_server(pg_ctl_path, opt.subscriber_dir);
    }    

    /*   
     * Start a short-lived standby server with temporary parameters (provided
     * by command-line options). The goal is to avoid connections during the
     * transformation steps.
     */
    pg_log_info("starting the standby with command-line options");
    start_standby_server(&opt, pg_ctl_path, server_start_log, true);


12. others

As Peter E pointed out [1], the main function is still huge. It has more than 400 lines.
I think all functions should have less than 100 line to keep the readability.

The previous versions moved a lot of code into its own function to improve
readability. Since you mentioned it again, it did some refactor to move some
code outside the main function. I created 2 new functions: setup_recovery
(groups instructions in preparation for recovery) and
drop_primary_replication_slot (remove the primary replication slot if it
exists). At this point, the main steps are in their own functions making it
easier to understand the code IMO.

    /* Get the absolute path of pg_ctl and pg_resetwal on the subscriber */
    pg_ctl_path = get_exec_path(argv[0], "pg_ctl");
    pg_resetwal_path = get_exec_path(argv[0], "pg_resetwal");
    ...
    pg_log_info("Done!");

The code snippet above has ~ 120 lines and the majority of the lines are
comments.

I considered separation idea like below. Note that this may require to change
orderings. How do you think?

* add parse_command_options() which accepts user options and verifies them
* add verification_phase() or something which checks system identifier and calls check_XXX
* add catchup_phase() or something which creates a temporary slot, writes recovery parameters,
  and wait until the end of recovery
* add cleanup_phase() or something which removes primary_slot and modifies the
  system identifier
* stop/start server can be combined into one wrapper.

IMO generic steps are more difficult to understand. I tend to avoid it. However,
as I said above, I moved some code into its own function. We could probably
consider grouping the check for required/optional arguments into its own
function. Other than that, it wouldn't reduce the main() size and increase the 
readability.

Attached txt file is proofs the concept.

13. others

PQresultStatus(res) is called 17 times in this source code, it may be redundant.
I think we can introduce a function like executeQueryOrDie() and gather in one place.

That's a good goal.

14. others

I found that pg_createsubscriber does not refer functions declared in other files.
Is there a possibility to use them, e.g., streamutils.h?

Which functions? IIRC we discussed it in the beginning of this thread but I
didn't find low-hanging fruits to use in this code.

15. others 

While reading the old discussions [2], Amit suggested to keep the comment and avoid
creating a temporary slot. You said "Got it" but temp slot still exists.
Is there any reason? Can you clarify your opinion?

I decided to refactor the code and does what Amit Kapila suggested: use the last
replication slot LSN as the replication start point. I keep it in a separate
patch (v26-0002) to make it easier to review. I'm planning to incorporate it if
nobody objects.

16. others

While reading [2] and [3], I was confused the decision. You and Amit discussed
the combination with pg_createsubscriber and slot sync and how should handle
slots on the physical standby. You seemed to agree to remove such a slot, and
Amit also suggested to raise an ERROR. However, you said in [8] that such
handlings is not mandatory so should raise an WARNING in dry_run. I was quite confused.
Am I missing something?

I didn't address this item in this patch. As you pointed out, pg_resetwal does
nothing if replication slots exist. That's not an excuse for doing nothing here.
I agree that we should check this case and provide a suitable error message. If
you have a complex replication scenario, users won't be happy with this
restriction. We can always improve the UI (dropping replication slots during the
execution if an option is provided, for example).

17. others

Per discussion around [4], we might have to consider an if the some options like
data_directory and config_file was initially specified for standby server. Another
easy approach is to allow users to specify options like -o in pg_upgrade [5],
which is similar to your idea. Thought?

I didn't address this item in this patch. I have a half baked patch for it. The
proposal is exactly to allow appending config_file option into -o.

pg_ctl start -o "-c config_file=/etc/postgresql/17/postgresql.conf" ...


18. others

How do you handle the reported failure [6]?

It is a PEBCAK. I don't see an easy way to detect the scenario 1. In the current
mode, we are susceptible to this human failure. The base backup support, won't
allow it. Regarding scenario 2, the referred error is the way to capture this
wrong command line. Do you expect a different message?

19. main
```
char    *pub_base_conninfo = NULL;
char    *sub_base_conninfo = NULL;
char    *dbname_conninfo = NULL;
```

No need to initialize pub_base_conninfo and sub_base_conninfo.
These variables would not be free'd.

Changed.

20. others

IIUC, slot creations would not be finished if there are prepared transactions.
Should we detect it on the verification phase and raise an ERROR?

Maybe. If we decide to do it, we should also check all cases not just prepared
transactions. The other option is to add a sentence into the documentation.

21. others

As I said in [7], the catch up would not be finished if long recovery_min_apply_delay
is used. Should we overwrite during the catch up?

No. If the time-delayed logical replica [2] was available, I would say that we
could use the apply delay for the logical replica. The user can expect that the
replica will continue to have the configured apply delay but that's not the case
if it silently ignore it. I'm not sure if an error is appropriate in this case 
because it requires an extra step. Another option is to print a message saying
there is an apply delay. In dry run mode, user can detect this case and make a
decision. Does it seem reasonable?

22. pg_createsubscriber.sgml
```
    <para>
     Check
     Write recovery parameters into the target data...
```

Not sure, but "Check" seems not needed.

It was a typo. Fixed.



--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Mar 6, 2024, at 8:24 AM, vignesh C wrote:
Few comments:

Thanks for your review. Some changes are included in v26.

1)   Can we use strdup here instead of atoi, as we do similarly in
case of pg_dump too, else we will do double conversion, convert using
atoi and again to string while forming the connection string:
+                       case 'p':
+                               if ((opt.sub_port = atoi(optarg)) <= 0)
+                                       pg_fatal("invalid subscriber
port number");
+                               break;

I don't have a strong preference but decided to provide a patch for it. See
v26-0003.

2) We can have some valid range for this, else we will end up in some
unexpected values when a higher number is specified:
+                       case 't':
+                               opt.recovery_timeout = atoi(optarg);
+                               break;

I wouldn't like to add an arbitrary value. Suggestions?

3) Now that we have addressed most of the items, can we handle this TODO:
+               /*
+                * TODO use primary_conninfo (if available) from subscriber and
+                * extract publisher connection string. Assume that there are
+                * identical entries for physical and logical
replication. If there is
+                * not, we would fail anyway.
+                */
+               pg_log_error("no publisher connection string specified");
+               pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+               exit(1);

It is not in my top priority at the moment.

4)  By default the log level as info here, I was not sure how to set
it to debug level to get these error messages:
+               pg_log_debug("publisher(%d): connection string: %s",
i, dbinfo[i].pubconninfo);
+               pg_log_debug("subscriber(%d): connection string: %s",
i, dbinfo[i].subconninfo);

      <term><option>-v</option></term>
      <term><option>--verbose</option></term>
      <listitem>
       <para>
        Enables verbose mode. This will cause
        <application>pg_createsubscriber</application> to output progress messages
        and detailed information about each step to standard error.
        Repeating the option causes additional debug-level messages to appear on
        standard error.
       </para>

5) Currently in non verbose mode there are no messages printed on
console, we could have a few of them printed irrespective of verbose
or not like the following:
a) creating publication
b) creating replication slot
c) waiting for the target server to reach the consistent state
d) If pg_createsubscriber fails after this point, you must recreate
the physical replica before continuing.
e) creating subscription

That's the idea. Quiet mode by default.

6) The message should be "waiting for the target server to reach the
consistent state":
+#define NUM_CONN_ATTEMPTS      5
+
+       pg_log_info("waiting the target server to reach the consistent state");
+
+       conn = connect_database(conninfo, true);

Fixed.


--
Euler Taveira

Re: speed up a logical replica setup

From
vignesh C
Date:
On Thu, 7 Mar 2024 at 10:05, Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!
>
>
> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

Few comments:
1) We are disconnecting database again in error case too, it will lead
to a double free in this scenario,
+       PQclear(res);
+
+       disconnect_database(conn, false);
+
+       if (max_repslots < num_dbs)
+       {
+               pg_log_error("subscriber requires %d replication
slots, but only %d remain",
+                                        num_dbs, max_repslots);
+               pg_log_error_hint("Consider increasing
max_replication_slots to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }
+
+       if (max_lrworkers < num_dbs)
+       {
+               pg_log_error("subscriber requires %d logical
replication workers, but only %d remain",
+                                        num_dbs, max_lrworkers);
+               pg_log_error_hint("Consider increasing
max_logical_replication_workers to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }

pg_createsubscriber: error: subscriber requires 5 logical replication
workers, but only 4 remain
pg_createsubscriber: hint: Consider increasing
max_logical_replication_workers to at least 5.
free(): double free detected in tcache 2
Aborted

2) We can also check that the primary is not using
synchronous_standby_names, else all the transactions in the primary
will wait infinitely once the standby server is stopped, this could be
added in the documentation:
+/*
+ * Is the primary server ready for logical replication?
+ */
+static void
+check_publisher(struct LogicalRepInfo *dbinfo)
+{
+       PGconn     *conn;
+       PGresult   *res;
+
+       char       *wal_level;
+       int                     max_repslots;
+       int                     cur_repslots;
+       int                     max_walsenders;
+       int                     cur_walsenders;
+
+       pg_log_info("checking settings on publisher");
+
+       conn = connect_database(dbinfo[0].pubconninfo, true);
+
+       /*
+        * If the primary server is in recovery (i.e. cascading replication),
+        * objects (publication) cannot be created because it is read only.
+        */
+       if (server_is_in_recovery(conn))
+       {
+               pg_log_error("primary server cannot be in recovery");
+               disconnect_database(conn, true);
+       }

3) This check is present only for publication, we do not have this in
case of creating a subscription. We can keep both of them similar,
i.e. have the check in both or don't have the check for both
publication and subscription:
+       /* Check if the publication already exists */
+       appendPQExpBuffer(str,
+                                         "SELECT 1 FROM
pg_catalog.pg_publication "
+                                         "WHERE pubname = '%s'",
+                                         dbinfo->pubname);
+       res = PQexec(conn, str->data);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not obtain publication information: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }
+

4) Few options are missing:
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_createsubscriber</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+   <group choice="plain">
+    <group choice="req">
+     <arg choice="plain"><option>-d</option></arg>
+     <arg choice="plain"><option>--database</option></arg>
+    </group>
+    <replaceable>dbname</replaceable>
+    <group choice="req">
+     <arg choice="plain"><option>-D</option> </arg>
+     <arg choice="plain"><option>--pgdata</option></arg>
+    </group>
+    <replaceable>datadir</replaceable>
+    <group choice="req">
+     <arg choice="plain"><option>-P</option></arg>
+     <arg choice="plain"><option>--publisher-server</option></arg>
+    </group>
+    <replaceable>connstr</replaceable>
+   </group>
+  </cmdsynopsis>
+ </refsynopsisdiv>

4a) -n, --dry-run
4b) -p, --subscriber-port
4c) -r, --retain
4d) -s, --socket-directory
4e) -t, --recovery-timeout
4f) -U, --subscriber-username

5) typo connnections should be connections
+       <para>
+        The port number on which the target server is listening for
connections.
+        Defaults to running the target server on port 50432 to avoid unintended
+        client connnections.

6) repliation should be replication
+ * Create the subscriptions, adjust the initial location for logical
+ * replication and enable the subscriptions. That's the last step for logical
+ * repliation setup.

7) I did not notice these changes in the latest patch:
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d808aad8b0..08de2bf4e6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -517,6 +517,7 @@ CreateSeqStmt
 CreateStatsStmt
 CreateStmt
 CreateStmtContext
+CreateSubscriberOptions
 CreateSubscriptionStmt
 CreateTableAsStmt
 CreateTableSpaceStmt
@@ -1505,6 +1506,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalRepInfo
 LogicalRepMsgType
 LogicalRepPartMapEntry
 LogicalRepPreparedTxnData

Regards,
Vignesh



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

I have created a top-up patch v27-0004. It contains additional test
cases for pg_createsubscriber.

Currently, two testcases (in v27-0004 patch) are failing. These
failures are related to running pg_createsubscriber on nodes in
cascade physical replication and are already reported in [1] and [2].
I think these cases should be fixed. Thoughts?

The idea of this patch is to keep track of testcases, so that any
future patch does not break any scenario which has already been worked
on. These testcases can be used to test in our development process,
but which test should actually be committed, can be discussed later.
Thought?

[1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com
[2]:
https://www.postgresql.org/message-id/CAA4eK1Kq8qWiBK1-ky%2BjkuJRedoWLh5%3DVOmd%2BYwh9L8PUxdq%2BQ%40mail.gmail.com



Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
Tomas Vondra
Date:
Hi,

I decided to take a quick look on this patch today, to see how it works
and do some simple tests. I've only started to get familiar with it, so
I have only some comments / questions regarding usage, not on the code.
It's quite possible I didn't understand some finer points, or maybe it
was already discussed earlier in this very long thread, so please feel
free to push back or point me to the past discussion.

Also, some of this is rather opinionated, but considering I didn't see
this patch before, my opinions may easily be wrong ...


1) SGML docs

It seems the SGML docs are more about explaining how this works on the
inside, rather than how to use the tool. Maybe that's intentional, but
as someone who didn't work with pg_createsubscriber before I found it
confusing and not very helpful.

For example, the first half of the page is prerequisities+warning, and
sure those are useful details, but prerequisities are checked by the
tool (so I can't really miss this) and warnings go into a lot of details
about different places where things may go wrong. Sure, worth knowing
and including in the docs, but maybe not right at the beginning, before
I learn how to even run the tool?

Maybe that's just me, though. Also, I'm sure it's not the only part of
our docs like this. Perhaps it'd be good to reorganize the content a bit
to make the "how to use" stuff more prominent?


2) this is a bit vague

... pg_createsubscriber will check a few times if the connection has
been reestablished to stream the required WAL. After a few attempts, it
terminates with an error.

What does "a few times" mean, and how many is "a few attempts"? Seems
worth knowing when using this tool in environments where disconnections
can happen. Maybe this should be configurable?


3) publication info

For a while I was quite confused about which tables get replicated,
until I realized the publication is FOR ALL TABLES. But I learned that
from this thread, the docs say nothing about this. Surely that's an
important detail that should be mentioned?


4) Is FOR ALL TABLES a good idea?

I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
sure it won't work for a number of use cases. I know large databases
it's common to create "work tables" (not necessarily temporary) as part
of a batch job, but there's no need to replicate those tables.

AFAIK that'd break this FOR ALL TABLES publication, because the tables
will qualify for replication, but won't be present on the subscriber. Or
did I miss something?

I do understand that FOR ALL TABLES is the simplest approach, and for v1
it may be an acceptable limitation, but maybe it'd be good to also
support restricting which tables should be replicated (e.g. blacklist or
whitelist based on table/schema name?).

BTW if I'm right and creating a table breaks the subscriber creation,
maybe it'd be good to explicitly mention that in the docs.

Note: I now realize this might fall under the warning about DDL, which
says this:

    Executing DDL commands on the source server while running
    pg_createsubscriber is not recommended. If the target server has
    already been converted to logical replica, the DDL commands must
    not be replicated so an error would occur.

But I find this confusing. Surely there are many DDL commands that have
absolutely no impact on logical replication (like creating an index or
view, various ALTER TABLE flavors, and so on). And running such DDL
certainly does not trigger error, right?


5) slot / publication / subscription name

I find it somewhat annoying it's not possible to specify names for
objects created by the tool - replication slots, publication and
subscriptions. If this is meant to be a replica running for a while,
after a while I'll have no idea what pg_createsubscriber_569853 or
pg_createsubscriber_459548_2348239 was meant for.

This is particularly annoying because renaming these objects later is
either not supported at all (e.g. for replication slots), or may be
quite difficult (e.g. publications).

I do realize there are challenges with custom names (say, if there are
multiple databases to replicate), but can't we support some simple
formatting with basic placeholders? So we could specify

--slot-name "myslot_%d_%p"

or something like that?

BTW what will happen if we convert multiple standbys? Can't they all get
the same slot name (they all have the same database OID, and I'm not
sure how much entropy the PID has)?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

I have tested the patch in windows. The pg_createsubscriber command is
failing in windows:

pg_createsubscriber -D ..\standby -d postgres  -P "host=localhost
port=5432" --subscriber-port 9000  -r -v
pg_createsubscriber: validating connection string on publisher
pg_createsubscriber: validating connection string on subscriber
pg_createsubscriber: checking if directory "../standby" is a cluster
data directory
pg_createsubscriber: getting system identifier from publisher
pg_createsubscriber: system identifier is 7343852918334005220 on publisher
pg_createsubscriber: getting system identifier from subscriber
pg_createsubscriber: system identifier is 7343852918334005220 on subscriber
pg_createsubscriber: standby is up and running
pg_createsubscriber: stopping the server to start the transformation steps
pg_createsubscriber: server was stopped
pg_createsubscriber: starting the standby with command-line options
pg_createsubscriber: server was started
pg_createsubscriber: checking settings on subscriber
pg_createsubscriber: error: connection to database failed: connection
to server on socket "D:/project/pg_euler_v27_debug/bin/.s.PGSQL.9000"
failed: Connection refused (0x0000274D/10061)
        Is the server running locally and accepting connections on that socket?

I found out that
+ sub_base_conninfo = psprintf("host=%s port=%u user=%s
fallback_application_name=%s",
+                opt.socket_dir, opt.sub_port, opt.sub_username, progname);

sub_base_conninfo has 'host' even for windows.
So when 'start_standby_server' is called it starts the server in
localhost but when we try to connect to standby inside
'check_subscriber', it tries to connect to the host defined in the
string 'sub_base_conninfo'. So, we are getting the error.

Created a top-up patch v27-0005 to resolve this.
Since there is no change in 0001, 0002...0004 patches that was
previously posted, I have reused the same version number.

Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Thu, 7 Mar 2024 at 18:31, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Hi,
>
> > Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> > and some issues pointed by Vignesh [1].
>
> I have created a top-up patch v27-0004. It contains additional test
> cases for pg_createsubscriber.
>
> Currently, two testcases (in v27-0004 patch) are failing. These
> failures are related to running pg_createsubscriber on nodes in
> cascade physical replication and are already reported in [1] and [2].
> I think these cases should be fixed. Thoughts?

We can fix these issues, if we are not planning to fix any of them, we
can add documentation for the same.

> The idea of this patch is to keep track of testcases, so that any
> future patch does not break any scenario which has already been worked
> on. These testcases can be used to test in our development process,
> but which test should actually be committed, can be discussed later.
> Thought?

Few comments for v27-0004-Add-additional-testcases.patch:
1) We could use command_fails_like to verify the reason of the error:
+# set max_replication_slots
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3');
+$node_p->restart;
+command_fails(
+       [
+               'pg_createsubscriber', '--verbose',
+               '--dry-run', '--pgdata',
+               $node_s->data_dir, '--publisher-server',
+               $node_p->connstr('pg1'), '--socket-directory',
+               $node_s->host, '--subscriber-port',
+               $node_s->port, '--database',
+               'pg1', '--database',
+               'pg2',
+       ],
+       'max_replication_slots are less in number in publisher');

2) Add a comment saying what is being verified
+# set max_replication_slots
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3');
+$node_p->restart;
+command_fails(
+       [
+               'pg_createsubscriber', '--verbose',
+               '--dry-run', '--pgdata',
+               $node_s->data_dir, '--publisher-server',
+               $node_p->connstr('pg1'), '--socket-directory',
+               $node_s->host, '--subscriber-port',
+               $node_s->port, '--database',
+               'pg1', '--database',
+               'pg2',
+       ],
+       'max_replication_slots are less in number in publisher');

3) We could rename this file something like
pg_create_subscriber_failure_cases or something better:
 src/bin/pg_basebackup/t/041_tests.pl | 285 +++++++++++++++++++++++++++
 1 file changed, 285 insertions(+)
 create mode 100644 src/bin/pg_basebackup/t/041_tests.pl

diff --git a/src/bin/pg_basebackup/t/041_tests.pl
b/src/bin/pg_basebackup/t/041_tests.pl
new file mode 100644
index 0000000000..2889d60d54
--- /dev/null
+++ b/src/bin/pg_basebackup/t/041_tests.pl
@@ -0,0 +1,285 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group


4) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 4');
+$node_p->restart;
+command_ok(
+       [
+               'pg_createsubscriber', '--verbose',
+               '--dry-run', '--pgdata',
+               $node_s->data_dir, '--publisher-server',
+               $node_p->connstr('pg1'), '--socket-directory',
+               $node_s->host, '--subscriber-port',
+               $node_s->port, '--database',
+               'pg1', '--database',
+               'pg2',
+       ],
+       'max_replication_slots are accurate on publisher');

5)  We could use command_fails_like to verify the reason of the error:
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 1');
$node_s->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

6) Add a comment saying what is being verified
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 1');
$node_s->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

7) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 2');
$node_s->restart;
command_ok(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

8) We could use command_fails_like to verify the reason of the error:
# set wal_level on publisher
$node_p->append_conf('postgresql.conf', 'wal_level = \'replica\'');
$node_p->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'wal_level must be logical');

9) Add a comment saying what is being verified
# set wal_level on publisher
$node_p->append_conf('postgresql.conf', 'wal_level = \'replica\'');
$node_p->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'wal_level must be logical');

10) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
$node_p->append_conf('postgresql.conf', 'wal_level = \'logical\'');
$node_p->restart;
command_ok(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'wal_level is logical');

11) We could use command_fails_like to verify the reason of the error:
# set max_wal_senders on publisher
$node_p->append_conf('postgresql.conf', 'max_wal_senders = 2');
$node_p->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_wal_senders is not sufficient');

12) Add a comment saying what is being verified:
# set max_wal_senders on publisher
$node_p->append_conf('postgresql.conf', 'max_wal_senders = 2');
$node_p->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_wal_senders is not sufficient');

13) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
$node_p->append_conf('postgresql.conf', 'max_wal_senders = 3');
$node_p->restart;
command_ok(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_wal_senders is sufficient');

14) This sleep is not required
# max_worker_processes on subscriber
$node_p->append_conf('postgresql.conf', 'max_worker_processes = 2');
$node_p->restart;
sleep 1;
$node_s->append_conf('postgresql.conf', 'max_worker_processes = 2');
$node_s->restart;

15) The similar comments exist in other places also, I'm not repeating them.

Regards,
Vignesh



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Tomas, Euler,

Thanks for starting to read the thread! Since I'm not an original author,
I want to reply partially.

> I decided to take a quick look on this patch today, to see how it works
> and do some simple tests. I've only started to get familiar with it, so
> I have only some comments / questions regarding usage, not on the code.
> It's quite possible I didn't understand some finer points, or maybe it
> was already discussed earlier in this very long thread, so please feel
> free to push back or point me to the past discussion.
> 
> Also, some of this is rather opinionated, but considering I didn't see
> this patch before, my opinions may easily be wrong ...

I felt your comments were quit valuable.

> 1) SGML docs
> 
> It seems the SGML docs are more about explaining how this works on the
> inside, rather than how to use the tool. Maybe that's intentional, but
> as someone who didn't work with pg_createsubscriber before I found it
> confusing and not very helpful.
>
> For example, the first half of the page is prerequisities+warning, and
> sure those are useful details, but prerequisities are checked by the
> tool (so I can't really miss this) and warnings go into a lot of details
> about different places where things may go wrong. Sure, worth knowing
> and including in the docs, but maybe not right at the beginning, before
> I learn how to even run the tool?

Hmm, right. I considered below improvements. Tomas and Euler, how do you think?

* Adds more descriptions in "Description" section.
* Moves prerequisities+warning to "Notes" section.
* Adds "Usage" section which describes from a single node.

> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> sure it won't work for a number of use cases. I know large databases
> it's common to create "work tables" (not necessarily temporary) as part
> of a batch job, but there's no need to replicate those tables.

Indeed, the documentation does not describe that all tables in the database
would be included in the publication.

> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> it may be an acceptable limitation, but maybe it'd be good to also
> support restricting which tables should be replicated (e.g. blacklist or
> whitelist based on table/schema name?).

May not directly related, but we considered that accepting options was a next-step [1].

> Note: I now realize this might fall under the warning about DDL, which
> says this:
> 
>     Executing DDL commands on the source server while running
>     pg_createsubscriber is not recommended. If the target server has
>     already been converted to logical replica, the DDL commands must
>     not be replicated so an error would occur.

Yeah, they would not be replicated, but not lead ERROR.
So should we say like "Creating tables on the source server..."?

> 5) slot / publication / subscription name
> 
> I find it somewhat annoying it's not possible to specify names for
> objects created by the tool - replication slots, publication and
> subscriptions. If this is meant to be a replica running for a while,
> after a while I'll have no idea what pg_createsubscriber_569853 or
> pg_createsubscriber_459548_2348239 was meant for.
> 
> This is particularly annoying because renaming these objects later is
> either not supported at all (e.g. for replication slots), or may be
> quite difficult (e.g. publications).
> 
> I do realize there are challenges with custom names (say, if there are
> multiple databases to replicate), but can't we support some simple
> formatting with basic placeholders? So we could specify
> 
> --slot-name "myslot_%d_%p"
> 
> or something like that?

Not sure we can do in the first version, but looks nice. One concern is that I
cannot find applications which accepts escape strings like log_line_prefix.
(It may just because we do not have use-case.) Do you know examples?

> BTW what will happen if we convert multiple standbys? Can't they all get
> the same slot name (they all have the same database OID, and I'm not
> sure how much entropy the PID has)?

I tested and the second try did not work. The primal reason was the name of publication
- pg_createsubscriber_%u (oid).
FYI - previously we can reuse same publications, but based on my comment [2] the
feature was removed. It might be too optimistic.

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889CCBD4D9DAF8BD2F18541F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
"Andrey M. Borodin"
Date:

> On 8 Mar 2024, at 12:03, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
>
<v27-0004-Add-additional-testcases.patch><v27-0001-pg_createsubscriber-creates-a-new-logical-replic.patch><v27-0005-Fix-error-for-windows.patch><v27-0002-Use-latest-replication-slot-position-as-replicat.patch><v27-0003-port-replace-int-with-string.patch>

I haven't digged into the thread, but recent version fails some CFbot's tests.

http://commitfest.cputube.org/euler-taveira.html
https://cirrus-ci.com/task/4833499115421696
==29928==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a000001458 at pc 0x7f3b29fdedce bp 0x7ffe68fcf1c0
sp0x7ffe68fcf1b8 

Thanks!


Best regards, Andrey Borodin.


Re: speed up a logical replica setup

From
Tomas Vondra
Date:

On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas, Euler,
> 
> Thanks for starting to read the thread! Since I'm not an original author,
> I want to reply partially.
> 
>> I decided to take a quick look on this patch today, to see how it works
>> and do some simple tests. I've only started to get familiar with it, so
>> I have only some comments / questions regarding usage, not on the code.
>> It's quite possible I didn't understand some finer points, or maybe it
>> was already discussed earlier in this very long thread, so please feel
>> free to push back or point me to the past discussion.
>>
>> Also, some of this is rather opinionated, but considering I didn't see
>> this patch before, my opinions may easily be wrong ...
> 
> I felt your comments were quit valuable.
> 
>> 1) SGML docs
>>
>> It seems the SGML docs are more about explaining how this works on the
>> inside, rather than how to use the tool. Maybe that's intentional, but
>> as someone who didn't work with pg_createsubscriber before I found it
>> confusing and not very helpful.
>>
>> For example, the first half of the page is prerequisities+warning, and
>> sure those are useful details, but prerequisities are checked by the
>> tool (so I can't really miss this) and warnings go into a lot of details
>> about different places where things may go wrong. Sure, worth knowing
>> and including in the docs, but maybe not right at the beginning, before
>> I learn how to even run the tool?
> 
> Hmm, right. I considered below improvements. Tomas and Euler, how do you think?
> 
> * Adds more descriptions in "Description" section.
> * Moves prerequisities+warning to "Notes" section.
> * Adds "Usage" section which describes from a single node.
> 
>> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
>> sure it won't work for a number of use cases. I know large databases
>> it's common to create "work tables" (not necessarily temporary) as part
>> of a batch job, but there's no need to replicate those tables.
> 
> Indeed, the documentation does not describe that all tables in the database
> would be included in the publication.
> 
>> I do understand that FOR ALL TABLES is the simplest approach, and for v1
>> it may be an acceptable limitation, but maybe it'd be good to also
>> support restricting which tables should be replicated (e.g. blacklist or
>> whitelist based on table/schema name?).
> 
> May not directly related, but we considered that accepting options was a next-step [1].
> 
>> Note: I now realize this might fall under the warning about DDL, which
>> says this:
>>
>>     Executing DDL commands on the source server while running
>>     pg_createsubscriber is not recommended. If the target server has
>>     already been converted to logical replica, the DDL commands must
>>     not be replicated so an error would occur.
> 
> Yeah, they would not be replicated, but not lead ERROR.
> So should we say like "Creating tables on the source server..."?
> 

Perhaps. Clarifying the docs would help, but it depends on the wording.
For example, I doubt this should talk about "creating tables" because
there are other DDL that (probably) could cause issues (like adding a
column to the table, or something like that).

>> 5) slot / publication / subscription name
>>
>> I find it somewhat annoying it's not possible to specify names for
>> objects created by the tool - replication slots, publication and
>> subscriptions. If this is meant to be a replica running for a while,
>> after a while I'll have no idea what pg_createsubscriber_569853 or
>> pg_createsubscriber_459548_2348239 was meant for.
>>
>> This is particularly annoying because renaming these objects later is
>> either not supported at all (e.g. for replication slots), or may be
>> quite difficult (e.g. publications).
>>
>> I do realize there are challenges with custom names (say, if there are
>> multiple databases to replicate), but can't we support some simple
>> formatting with basic placeholders? So we could specify
>>
>> --slot-name "myslot_%d_%p"
>>
>> or something like that?
> 
> Not sure we can do in the first version, but looks nice. One concern is that I
> cannot find applications which accepts escape strings like log_line_prefix.
> (It may just because we do not have use-case.) Do you know examples?
> 

I can't think of a tool already doing that, but I think that's simply
because it was not needed. Why should we be concerned about this?

>> BTW what will happen if we convert multiple standbys? Can't they all get
>> the same slot name (they all have the same database OID, and I'm not
>> sure how much entropy the PID has)?
> 
> I tested and the second try did not work. The primal reason was the name of publication
> - pg_createsubscriber_%u (oid).
> FYI - previously we can reuse same publications, but based on my comment [2] the
> feature was removed. It might be too optimistic.
> 

OK. I could be convinced the other limitations are reasonable for v1 and
can be improved later, but this seems like something that needs fixing.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: speed up a logical replica setup

From
vignesh C
Date:
On Fri, 8 Mar 2024 at 15:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 8 Mar 2024, at 12:03, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> >
<v27-0004-Add-additional-testcases.patch><v27-0001-pg_createsubscriber-creates-a-new-logical-replic.patch><v27-0005-Fix-error-for-windows.patch><v27-0002-Use-latest-replication-slot-position-as-replicat.patch><v27-0003-port-replace-int-with-string.patch>
>
> I haven't digged into the thread, but recent version fails some CFbot's tests.
>
> http://commitfest.cputube.org/euler-taveira.html
> https://cirrus-ci.com/task/4833499115421696
> ==29928==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a000001458 at pc 0x7f3b29fdedce bp
0x7ffe68fcf1c0sp 0x7ffe68fcf1b8 

This is because of disconnect_database called twice in the error flow:
+       PQclear(res);
+
+       disconnect_database(conn, false);
+
+       if (max_repslots < num_dbs)
+       {
+               pg_log_error("subscriber requires %d replication
slots, but only %d remain",
+                                        num_dbs, max_repslots);
+               pg_log_error_hint("Consider increasing
max_replication_slots to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }
+
+       if (max_lrworkers < num_dbs)
+       {
+               pg_log_error("subscriber requires %d logical
replication workers, but only %d remain",
+                                        num_dbs, max_lrworkers);
+               pg_log_error_hint("Consider increasing
max_logical_replication_workers to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }

This is handled in the attached patch set. Apart from this there are a
couple of test failures which will be handled in the upcoming version.

Regards,
Vignesh

Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> > Dear Tomas, Euler,
> >
> > Thanks for starting to read the thread! Since I'm not an original author,
> > I want to reply partially.
> >
> >> I decided to take a quick look on this patch today, to see how it works
> >> and do some simple tests. I've only started to get familiar with it, so
> >> I have only some comments / questions regarding usage, not on the code.
> >> It's quite possible I didn't understand some finer points, or maybe it
> >> was already discussed earlier in this very long thread, so please feel
> >> free to push back or point me to the past discussion.
> >>
> >> Also, some of this is rather opinionated, but considering I didn't see
> >> this patch before, my opinions may easily be wrong ...
> >
> > I felt your comments were quit valuable.
> >
> >> 1) SGML docs
> >>
> >> It seems the SGML docs are more about explaining how this works on the
> >> inside, rather than how to use the tool. Maybe that's intentional, but
> >> as someone who didn't work with pg_createsubscriber before I found it
> >> confusing and not very helpful.
> >>
> >> For example, the first half of the page is prerequisities+warning, and
> >> sure those are useful details, but prerequisities are checked by the
> >> tool (so I can't really miss this) and warnings go into a lot of details
> >> about different places where things may go wrong. Sure, worth knowing
> >> and including in the docs, but maybe not right at the beginning, before
> >> I learn how to even run the tool?
> >
> > Hmm, right. I considered below improvements. Tomas and Euler, how do you think?
> >
> > * Adds more descriptions in "Description" section.
> > * Moves prerequisities+warning to "Notes" section.
> > * Adds "Usage" section which describes from a single node.
> >
> >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> >> sure it won't work for a number of use cases. I know large databases
> >> it's common to create "work tables" (not necessarily temporary) as part
> >> of a batch job, but there's no need to replicate those tables.
> >
> > Indeed, the documentation does not describe that all tables in the database
> > would be included in the publication.
> >
> >> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> >> it may be an acceptable limitation, but maybe it'd be good to also
> >> support restricting which tables should be replicated (e.g. blacklist or
> >> whitelist based on table/schema name?).
> >
> > May not directly related, but we considered that accepting options was a next-step [1].
> >
> >> Note: I now realize this might fall under the warning about DDL, which
> >> says this:
> >>
> >>     Executing DDL commands on the source server while running
> >>     pg_createsubscriber is not recommended. If the target server has
> >>     already been converted to logical replica, the DDL commands must
> >>     not be replicated so an error would occur.
> >
> > Yeah, they would not be replicated, but not lead ERROR.
> > So should we say like "Creating tables on the source server..."?
> >
>
> Perhaps. Clarifying the docs would help, but it depends on the wording.
> For example, I doubt this should talk about "creating tables" because
> there are other DDL that (probably) could cause issues (like adding a
> column to the table, or something like that).
>
> >> 5) slot / publication / subscription name
> >>
> >> I find it somewhat annoying it's not possible to specify names for
> >> objects created by the tool - replication slots, publication and
> >> subscriptions. If this is meant to be a replica running for a while,
> >> after a while I'll have no idea what pg_createsubscriber_569853 or
> >> pg_createsubscriber_459548_2348239 was meant for.
> >>
> >> This is particularly annoying because renaming these objects later is
> >> either not supported at all (e.g. for replication slots), or may be
> >> quite difficult (e.g. publications).
> >>
> >> I do realize there are challenges with custom names (say, if there are
> >> multiple databases to replicate), but can't we support some simple
> >> formatting with basic placeholders? So we could specify
> >>
> >> --slot-name "myslot_%d_%p"
> >>
> >> or something like that?
> >
> > Not sure we can do in the first version, but looks nice. One concern is that I
> > cannot find applications which accepts escape strings like log_line_prefix.
> > (It may just because we do not have use-case.) Do you know examples?
> >
>
> I can't think of a tool already doing that, but I think that's simply
> because it was not needed. Why should we be concerned about this?
>
> >> BTW what will happen if we convert multiple standbys? Can't they all get
> >> the same slot name (they all have the same database OID, and I'm not
> >> sure how much entropy the PID has)?
> >
> > I tested and the second try did not work. The primal reason was the name of publication
> > - pg_createsubscriber_%u (oid).
> > FYI - previously we can reuse same publications, but based on my comment [2] the
> > feature was removed. It might be too optimistic.
> >
>
> OK. I could be convinced the other limitations are reasonable for v1 and
> can be improved later, but this seems like something that needs fixing.

+1 to handle this.
Currently, a) Publication name = pg_createsubscriber_%u, where %u is
database oid, b) Replication slot name =  pg_createsubscriber_%u_%d,
Where %u is database oid and %d is the pid and c) Subscription name =
pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid
How about we have a non mandatory option like
--prefix_object_name=mysetup1, which will create a) Publication name =
mysetup1_pg_createsubscriber_%u, and b) Replication slot name =
mysetup1_pg_createsubscriber_%u (here pid is also removed) c)
Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also
removed).

In the default case where the user does not specify
--prefix_object_name the object names will be created without any
prefix names.

Regards,
Vignesh



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

Thanks for updating the patch, but cfbot still got angry [1].
Note that two containers (autoconf and meson) failed at different place,
so I think it is intentional one. It seems that there may be a bug related with 32-bit build.

We should see and fix as soon as possible.

[1]: http://cfbot.cputube.org/highlights/all.html#4637

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Mar 7, 2024 at 10:44 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> 4) Is FOR ALL TABLES a good idea?
>
> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> sure it won't work for a number of use cases. I know large databases
> it's common to create "work tables" (not necessarily temporary) as part
> of a batch job, but there's no need to replicate those tables.
>
> AFAIK that'd break this FOR ALL TABLES publication, because the tables
> will qualify for replication, but won't be present on the subscriber. Or
> did I miss something?
>

As the subscriber is created from standby, all the tables should be
present at least initially during and after creating the subscriber.
Users are later free to modify the publications/subscriptions.

> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> it may be an acceptable limitation, but maybe it'd be good to also
> support restricting which tables should be replicated (e.g. blacklist or
> whitelist based on table/schema name?).
>

This would be useful, but OTOH could also be enhanced in a later
version unless we think it is a must for the first version.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Mar 11, 2024 at 9:42 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, 9 Mar 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> >
> > >> 5) slot / publication / subscription name
> > >>
> > >> I find it somewhat annoying it's not possible to specify names for
> > >> objects created by the tool - replication slots, publication and
> > >> subscriptions. If this is meant to be a replica running for a while,
> > >> after a while I'll have no idea what pg_createsubscriber_569853 or
> > >> pg_createsubscriber_459548_2348239 was meant for.
> > >>
> > >> This is particularly annoying because renaming these objects later is
> > >> either not supported at all (e.g. for replication slots), or may be
> > >> quite difficult (e.g. publications).
> > >>
> > >> I do realize there are challenges with custom names (say, if there are
> > >> multiple databases to replicate), but can't we support some simple
> > >> formatting with basic placeholders? So we could specify
> > >>
> > >> --slot-name "myslot_%d_%p"
> > >>
> > >> or something like that?
> > >
> > > Not sure we can do in the first version, but looks nice. One concern is that I
> > > cannot find applications which accepts escape strings like log_line_prefix.
> > > (It may just because we do not have use-case.) Do you know examples?
> > >
> >
> > I can't think of a tool already doing that, but I think that's simply
> > because it was not needed. Why should we be concerned about this?
> >
>
> +1 to handle this.
> Currently, a) Publication name = pg_createsubscriber_%u, where %u is
> database oid, b) Replication slot name =  pg_createsubscriber_%u_%d,
> Where %u is database oid and %d is the pid and c) Subscription name =
> pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid
> How about we have a non mandatory option like
> --prefix_object_name=mysetup1, which will create a) Publication name =
> mysetup1_pg_createsubscriber_%u, and b) Replication slot name =
> mysetup1_pg_createsubscriber_%u (here pid is also removed) c)
> Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also
> removed).
>
> In the default case where the user does not specify
> --prefix_object_name the object names will be created without any
> prefix names.
>

Tomas's idea is better in terms of useability. So, we should instead
have three switches --slot-name, --publication-name, and
--subscriber-name with some provision of appending dbid's and some
unique identifier for standby. The unique identifier can help in
creating multiple subscribers from different standbys.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Fri, Mar 8, 2024, at 6:44 AM, Hayato Kuroda (Fujitsu) wrote:
Hmm, right. I considered below improvements. Tomas and Euler, how do you think?

I'm posting a new patchset v28.

I changed the way that the check function works. From the usability
perspective, it is better to test all conditions and reports all errors (if
any) at once. It avoids multiple executions in dry run mode just to figure out
all of the issues in the initial phase. I also included tests for it using
Shlok's idea [1] although I didn't use v27-0004.

Shlok [1] reported that it was failing on Windows since the socket-directory
option was added. I added a fix for it.

Tomas pointed out the documentation [2] does not provide a good high level
explanation about pg_createsubscriber. I expanded the Description section and
moved the prerequisites to Nodes section. The prerequisites were grouped into
target and source conditions on their own paragraph instead of using a list. It
seems more in line with the style of some applications.

As I said in a previous email [3], I removed the retain option.




--
Euler Taveira

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch! I have not reviewed yet, but cfbot did not
accept your patch [1]. It seems that you missed updating the windows part
in 0003 patch:

```
#if !defined(WIN32)
-       sub_base_conninfo = psprintf("host=%s port=%u user=%s fallback_application_name=%s",
+       sub_base_conninfo = psprintf("host=%s port=%s user=%s fallback_application_name=%s",
                                                                 opt.socket_dir, opt.sub_port, opt.sub_username,
progname);
 #else                                                  /* WIN32 */
        sub_base_conninfo = psprintf("port=%u user=%s fallback_application_name=%s"
```

Since the change is quite trivial, I felt no need to update the versioning.
PSA the new set to keep the cfbot quiet. These could pass tests on my CI.

[1]:  https://cirrus-ci.com/build/5253379782344704

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

Currently when the pg_createsubscriber error out and exit in between
execution, the standby server started by the pg_createsubscriber may
still be running. I think this standby server should be stopped at
exit of pg_createsubscriber(when it errors out), as this server is
started by pg_createsubscriber and it may be running with
configurations that may not be desired by the user (such as " -c
listen_addresses='' -c unix_socket_permissions=0700"). Thoughts?

Added a top-up patch v28-0005 to fix this issue.
I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote:
Added a top-up patch v28-0005 to fix this issue.
I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks for your review!

I'm posting a new patch (v29) that merges the previous patches (v28-0002 and
v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during
a rebase. I also included the patch proposed by Shlok [2] that stops the target
server on error if it is running.

Tomas suggested in [3] that maybe the PID should be replaced with something
else that has more entropy. Instead of PID, it uses a random number for
replication slot and subscription. There is also a concern about converting
multiple standbys that will have the same publication name. It added the same
random number to the publication name so it doesn't fail because the
publication already exists. Documentation was changed based on Tomas feedback.

The user name was always included in the subscriber connection string. Let's
have the libpq to choose it. While on it, a new routine (get_sub_conninfo)
contains the code to build the subscriber connection string.

As I said in [4], there wasn't a way to inform a different configuration file.
If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber
starts the server it will fail. The new --config-file option let you inform the
postgresql.conf location and the server is started just fine.

I also did some changes in the start_standby_server routine. I replaced the
strcat and snprintf with appendPQExpBuffer that has been used to store the
pg_ctl command.




--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Mar 15, 2024 at 9:23 AM Euler Taveira <euler@eulerto.com> wrote:
>

Did you consider adding options for publication/subscription/slot
names as mentioned in my previous email? As discussed in a few emails
above, it would be quite confusing for users to identify the logical
replication objects once the standby is converted to subscriber.

*
+static void
+cleanup_objects_atexit(void)
{
...
conn = connect_database(dbinfo[i].pubconninfo, false);
+ if (conn != NULL)
+ {
+ if (dbinfo[i].made_publication)
+ drop_publication(conn, &dbinfo[i]);
+ if (dbinfo[i].made_replslot)
+ drop_replication_slot(conn, &dbinfo[i], dbinfo[i].subname);
+ disconnect_database(conn, false);
+ }
+ else
+ {
+ /*
+ * If a connection could not be established, inform the user
+ * that some objects were left on primary and should be
+ * removed before trying again.
+ */
+ if (dbinfo[i].made_publication)
+ {
+ pg_log_warning("There might be a publication \"%s\" in database
\"%s\" on primary",
+    dbinfo[i].pubname, dbinfo[i].dbname);
+ pg_log_warning_hint("Consider dropping this publication before
trying again.");
+ }

It seems we care only for publications created on the primary. Isn't
it possible that some of the publications have been replicated to
standby by that time, for example, in case failure happens after
creating a few publications? IIUC, we don't care for standby cleanup
after failure because it can't be used for streaming replication
anymore. So, the only choice the user has is to recreate the standby
and start the pg_createsubscriber again. This sounds questionable to
me as to whether users would like this behavior. Does anyone else have
an opinion on this point?

I see the below note in the patch:
+    If <application>pg_createsubscriber</application> fails while processing,
+    then the data directory is likely not in a state that can be recovered. It
+    is true if the target server was promoted. In such a case, creating a new
+    standby server is recommended.

By reading this it is not completely clear whether the standby is not
recoverable in case of any error or only an error after the target
server is promoted. If others agree with this behavior then we should
write the detailed reason for this somewhere in the comments as well
unless it is already explained.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 11 Mar 2024 at 10:33, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for updating the patch, but cfbot still got angry [1].
> Note that two containers (autoconf and meson) failed at different place,
> so I think it is intentional one. It seems that there may be a bug related with 32-bit build.
>
> We should see and fix as soon as possible.

I was able to reproduce this random failure and found the following reason:
The Minimum recovery ending location 0/5000000 was more than the
recovery_target_lsn specified is "0/4001198". In few random cases the
standby applies a few more WAL records after the replication slot is
created; this leads to minimum recovery ending location being greater
than the recovery_target_lsn because of which the server will fail
with:
FATAL:  requested recovery stop point is before consistent recovery point

I have fixed it by pausing the replay in the standby server before the
replication slots get created.
The attached v29-0002-Keep-standby-server-s-minimum-recovery-point-les.patch
patch has the changes for the same.
Thoughts?

Regards,
Vignesh

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
Did you consider adding options for publication/subscription/slot
names as mentioned in my previous email? As discussed in a few emails
above, it would be quite confusing for users to identify the logical
replication objects once the standby is converted to subscriber.

Yes. I was wondering to implement after v1 is pushed. I started to write a code
for it but I wasn't sure about the UI. The best approach I came up with was
multiple options in the same order. (I don't provide short options to avoid
possible portability issues with the order.) It means if I have 3 databases and
the following command-line:

pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
pubx --publication puby --publication pubz

pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.

It seems we care only for publications created on the primary. Isn't
it possible that some of the publications have been replicated to
standby by that time, for example, in case failure happens after
creating a few publications? IIUC, we don't care for standby cleanup
after failure because it can't be used for streaming replication
anymore. So, the only choice the user has is to recreate the standby
and start the pg_createsubscriber again. This sounds questionable to
me as to whether users would like this behavior. Does anyone else have
an opinion on this point?

If it happens after creating a publication and before promotion, the cleanup
routine will drop the publications on primary and it will eventually be applied
to the standby via replication later.

I see the below note in the patch:
+    If <application>pg_createsubscriber</application> fails while processing,
+    then the data directory is likely not in a state that can be recovered. It
+    is true if the target server was promoted. In such a case, creating a new
+    standby server is recommended.

By reading this it is not completely clear whether the standby is not
recoverable in case of any error or only an error after the target
server is promoted. If others agree with this behavior then we should
write the detailed reason for this somewhere in the comments as well
unless it is already explained.

I rewrote the sentence to make it clear that only if the server is promoted,
the target server will be in a state that cannot be reused. It provides a
message saying it too.

pg_createsubscriber: target server reached the consistent state
pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
must recreate the physical replica before continuing.

I'm attaching a new version (v30) that adds:

* 3 new options (--publication, --subscription, --replication-slot) to assign
  names to the objects. The --database option used to ignore duplicate names,
  however, since these new options rely on the number of database options to
  match the number of object name options, it is forbidden from now on. The
  duplication is also forbidden for the object names to avoid errors earlier.
* rewrite the paragraph related to unusuable target server after
  pg_createsubscriber fails.
* Vignesh reported an issue [1] related to reaching the recovery stop point
  before the consistent state is reached. I proposed a simple patch that fixes
  the issue.



--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
I was able to reproduce this random failure and found the following reason:
The Minimum recovery ending location 0/5000000 was more than the
recovery_target_lsn specified is "0/4001198". In few random cases the
standby applies a few more WAL records after the replication slot is
created; this leads to minimum recovery ending location being greater
than the recovery_target_lsn because of which the server will fail
with:
FATAL:  requested recovery stop point is before consistent recovery point

Thanks for checking. I proposed an alternative patch for it [1]. Can you check
it?



--
Euler Taveira

Re: speed up a logical replica setup

From
vignesh C
Date:
On Sat, 16 Mar 2024 at 21:16, Euler Taveira <euler@eulerto.com> wrote:
>
> On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
>
> I was able to reproduce this random failure and found the following reason:
> The Minimum recovery ending location 0/5000000 was more than the
> recovery_target_lsn specified is "0/4001198". In few random cases the
> standby applies a few more WAL records after the replication slot is
> created; this leads to minimum recovery ending location being greater
> than the recovery_target_lsn because of which the server will fail
> with:
> FATAL:  requested recovery stop point is before consistent recovery point
>
>
> Thanks for checking. I proposed an alternative patch for it [1]. Can you check
> it?

This approach looks good to me.

Regards,
Vignesh



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
>

With this syntax, you want to give the user the option to specify
publication/subscription/slot name for each database? If so, won't it
be too much verbose?

> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be applied
> to the standby via replication later.
>

Do you mean to say that the next time if user uses
pg_createsubscriber, it will be ensured that all the prior WAL will be
replicated? I think we need to ensure that after the restart of this
tool and before it attempts to create the publications again, the WAL
of the previous drop has to be replayed.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Mar 18, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
> >
> > Did you consider adding options for publication/subscription/slot
> > names as mentioned in my previous email? As discussed in a few emails
> > above, it would be quite confusing for users to identify the logical
> > replication objects once the standby is converted to subscriber.
> >
> >
> > Yes. I was wondering to implement after v1 is pushed. I started to write a code
> > for it but I wasn't sure about the UI. The best approach I came up with was
> > multiple options in the same order. (I don't provide short options to avoid
> > possible portability issues with the order.) It means if I have 3 databases and
> > the following command-line:
> >
> > pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> > pubx --publication puby --publication pubz
> >
> > pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
> >
>
> With this syntax, you want to give the user the option to specify
> publication/subscription/slot name for each database? If so, won't it
> be too much verbose?
>
> > It seems we care only for publications created on the primary. Isn't
> > it possible that some of the publications have been replicated to
> > standby by that time, for example, in case failure happens after
> > creating a few publications? IIUC, we don't care for standby cleanup
> > after failure because it can't be used for streaming replication
> > anymore. So, the only choice the user has is to recreate the standby
> > and start the pg_createsubscriber again. This sounds questionable to
> > me as to whether users would like this behavior. Does anyone else have
> > an opinion on this point?
> >
> >
> > If it happens after creating a publication and before promotion, the cleanup
> > routine will drop the publications on primary and it will eventually be applied
> > to the standby via replication later.
> >
>
> Do you mean to say that the next time if user uses
> pg_createsubscriber, it will be ensured that all the prior WAL will be
> replicated? I think we need to ensure that after the restart of this
> tool and before it attempts to create the publications again, the WAL
> of the previous drop has to be replayed.
>

On further thinking, the WAL for such dropped publications should get
replayed eventually before the WAL for new publications (the
publications which will be created after restart) unless the required
WAL is removed on primary due to some reason.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Euler,

Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

02.

"The main difference between the logical replication setup and pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a supplement,
not prerequisite.

09.

"The source server must accept connections from the target server. The source server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in recovery."

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."
 
Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might fail."

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

15.

"a LSN"

s/a/an

16.

"of write-ahead"

s/of/of the/

17.

"specifies promote"

We can do double-quote for the word promote.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the generated
objects by pg_createsubscriber. How do other think?

22.

```
#define    BASE_OUTPUT_DIR        "pg_createsubscriber_output.d"
```

No one refers the define.

23.
 
```
}            CreateSubscriberOptions;
...
}            LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

23.

```
static int    num_dbs = 0;
static int    num_pubs = 0;
static int    num_subs = 0;
static int    num_replslots = 0;
```

I think the name is bit confusing. The number of generating publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the number of
specified.

My idea is num_custom_pubs or something. Thought?

24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?

25.

```
 * Replication slots, publications and subscriptions are created. Depending on
 * the step it failed, it should remove the already created objects if it is
 * possible (sometimes it won't work due to a connection issue).
```

I think it should be specified here that subscriptions won't be removed with the
reason. 

26.

```

    /*
     * If the server is promoted, there is no way to use the current setup
     * again. Warn the user that a new replication setup should be done before
     * trying again.
     */
```

Per comment 25, we can add a reference like "See comments atop the function"

27.

usage() was not updated based on recent changes.

28.

```
        if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL)
        {
            if (dbname)
                *dbname = pg_strdup(conn_opt->val);
            continue;
        }
```

There is a memory-leak if multiple dbname are specified in the conninfo.

29.

```
    pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
```

No need to initialize the seed every time. Can you reuse pg_prng_state?

30.

```
        if (num_replslots == 0)
            dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.

31.

```
        /* Create replication slot on publisher */
        if (lsn)
            pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
as flag.

32.

```
/*
 * Close the connection. If exit_on_error is true, it has an undesired
 * condition and it should exit immediately.
 */
static void
disconnect_database(PGconn *conn, bool exit_on_error)
```

In case of disconnect_database(), the second argument should have different name.
If it is true, the process exits unconditionally.
Also, comments atop the function must be fixed.


33.

```
    wal_level = strdup(PQgetvalue(res, 0, 0));
```

pg_strdup should be used here.

34.

```
        {"config-file", required_argument, NULL, 1},
        {"publication", required_argument, NULL, 2},
        {"replication-slot", required_argument, NULL, 3},
        {"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

35.

```
    opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.

36.

```
                pg_free(opt.sub_port);
```

You said that the leak won't be concerned here. If so, why only 'p' has pg_free()?

37.

```
    /* Register a function to clean up objects in case of failure */
    atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.

38.

```
    /* Subscriber PID file */
    snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);

    /*
     * If the standby server is running, stop it. Some parameters (that can
     * only be set at server start) are informed by command-line options.
     */
    if (stat(pidfile, &statbuf) == 0)
```

Hmm. pidfile is used only here, but it is declared in main(). Can it be
separated into another funtion like is_standby_started()?

39.

Or, we may able to introcue "restart_standby_if_needed" or something.

40.

```
     * XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is related
with the controldata file, I think it can be located in controldata_util.c.

41.

You said like below in [1], but I could not find the related fix. Can you clarify?

> That's a good point. We should state in the documentation that GUCs specified in
> the command-line options are ignored during the execution.

[1]: https://www.postgresql.org/message-id/40595e73-c7e1-463a-b8be-49792e870007%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Re: speed up a logical replica setup

From
vignesh C
Date:
On Sat, 16 Mar 2024 at 21:13, Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
>
> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be applied
> to the standby via replication later.
>
> I see the below note in the patch:
> +    If <application>pg_createsubscriber</application> fails while processing,
> +    then the data directory is likely not in a state that can be recovered. It
> +    is true if the target server was promoted. In such a case, creating a new
> +    standby server is recommended.
>
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.
>
>
> I rewrote the sentence to make it clear that only if the server is promoted,
> the target server will be in a state that cannot be reused. It provides a
> message saying it too.
>
> pg_createsubscriber: target server reached the consistent state
> pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
> must recreate the physical replica before continuing.
>
> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(&opt.replslot_names, optarg))
+ {
+ simple_string_list_append(&opt.replslot_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
 ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists

2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(&opt.sub_names, optarg))
+ {
+ simple_string_list_append(&opt.sub_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3

Subscriptions will be created with the same name and later there will
be a problem when setting replication progress as there will be
multiple subscriptions with the same name.

Regards,
Vignesh



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 18.03.24 08:18, vignesh C wrote:
> 1) Maximum size of the object name is 64, we can have a check so that
> we don't specify more than the maximum allowed length:
> + case 3:
> + if (!simple_string_list_member(&opt.replslot_names, optarg))
> + {
> + simple_string_list_append(&opt.replslot_names, optarg);
> + num_replslots++;
> + }
> + else
> + {
> + pg_log_error("duplicate replication slot \"%s\"", optarg);
> + exit(1);
> + }
> + break;
> 
> If we allow something like this:
>   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> In this case creation of replication slot will fail:
> pg_createsubscriber: error: could not create replication slot
> "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> database "db2": ERROR:  replication slot
> "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> already exists

I think this is fine.  The server can check whether the names it is 
given are of the right size.  We don't need to check it again in the client.

> 2) Similarly here too:
> + case 4:
> + if (!simple_string_list_member(&opt.sub_names, optarg))
> + {
> + simple_string_list_append(&opt.sub_names, optarg);
> + num_subs++;
> + }
> + else
> + {
> + pg_log_error("duplicate subscription \"%s\"", optarg);
> + exit(1);
> + }
> + break;
> 
> If we allow something like this:
> ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> 
> Subscriptions will be created with the same name and later there will
> be a problem when setting replication progress as there will be
> multiple subscriptions with the same name.

Could you clarify this problem?




Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote:
> 02.
> 
> "The main difference between the logical replication setup and pg_createsubscriber
> is the initial data copy."
> 
> Grammarly suggests:
> "The initial data copy is the main difference between the logical replication
> setup and pg_createsubscriber."

I think that change is worse.

> 09.
> 
> "The source server must accept connections from the target server. The source server must not be in recovery."
> 
> Grammarly suggests:
> "The source server must accept connections from the target server and not be in recovery."

I think the previous version is better.

> 17.
> 
> "specifies promote"
> 
> We can do double-quote for the word promote.

The v30 patch has <literal>promote</literal>, which I think is adequate.

> 19.
> 
> "is accepting read-write transactions"
> 
> Grammarly suggests:
> "accepts read-write transactions"

I like the first one better.

> 20.
> 
> New options must be also documented as well. This helps not only users but also
> reviewers.
> (Sometimes we cannot identify that the implementation is intentinal or not.)

Which ones are missing?

> 21.
> 
> Also, not sure the specification is good. I preferred to specify them by format
> string. Because it can reduce the number of arguments and I cannot find use cases
> which user want to control the name of objects.
> 
> However, your approach has a benefit which users can easily identify the generated
> objects by pg_createsubscriber. How do other think?

I think listing them explicitly is better for the first version.  It's 
simpler to implement and more flexible.

> 22.
> 
> ```
> #define    BASE_OUTPUT_DIR        "pg_createsubscriber_output.d"
> ```
> 
> No one refers the define.

This is gone in v30.

> 23.
>   
> ```
> }            CreateSubscriberOptions;
> ...
> }            LogicalRepInfo;
> ```
> 
> Declarations after the "{" are not needed, because we do not do typedef.

Yeah, this is actually wrong, because as it is written now, it defines 
global variables.

> 22.
> 
> While seeing definitions of functions, I found that some pointers are declared
> as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
> changed but not the constant. Is it just missing or is there another rule?

Yes, more could be done here.  I have attached a patch for this.  (This 
also requires the just-committed 48018f1d8c.)

> 24.
> 
> ```
> /* standby / subscriber data directory */
> static char *subscriber_dir = NULL;
> ```
> 
> It is bit strange that only subscriber_dir is a global variable. Caller requires
> the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
> main. So, how about makeing `CreateSubscriberOptions opt` to global one?

Fewer global variables seem preferable.  Only make global as needed.

> 30.
> 
> ```
>         if (num_replslots == 0)
>             dbinfo[i].replslotname = pg_strdup(genname);
> ```
> 
> I think the straightforward way is to use the name of subscription if no name
> is specified. This follows the rule for CREATE SUBSCRIPTION.

right

> 31.
> 
> ```
>         /* Create replication slot on publisher */
>         if (lsn)
>             pg_free(lsn);
> ```
> 
> I think allocating/freeing memory is not so efficient.
> Can we add a flag to create_logical_replication_slot() for controlling the
> returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
> as flag.

Nothing is even using the return value of 
create_logical_replication_slot().  I think this can be removed altogether.

> 34.
> 
> ```
>         {"config-file", required_argument, NULL, 1},
>         {"publication", required_argument, NULL, 2},
>         {"replication-slot", required_argument, NULL, 3},
>         {"subscription", required_argument, NULL, 4},
> ```
> 
> The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
> options which do not have short notation are listed behind.
> 
> 35.
> 
> ```
>     opt.sub_port = palloc(16);
> ```
> 
> Per other lines, pg_alloc() should be used.

Even better psprintf().

> 37.
> 
> ```
>     /* Register a function to clean up objects in case of failure */
>     atexit(cleanup_objects_atexit);
> ```
> 
> Sorry if we have already discussed. I think the registration can be moved just
> before the boot of the standby. Before that, the callback will be no-op.

But it can also stay where it is.  What is the advantage of moving it later?

> 40.
> 
> ```
>      * XXX this code was extracted from BootStrapXLOG().
> ```
> 
> So, can we extract the common part to somewhere? Since system identifier is related
> with the controldata file, I think it can be located in controldata_util.c.

Let's leave it as is for this PG release.

Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 16.03.24 16:42, Euler Taveira wrote:
> I'm attaching a new version (v30) that adds:

I have some review comments and attached a patch with some smaller 
fixups (mainly message wording and avoid fixed-size string buffers).

* doc/src/sgml/ref/pg_createsubscriber.sgml

I would remove the "How It Works" section.  This is not relevant to
users, and it is very detailed and will require updating whenever the
implementation changes.  It could be a source code comment instead.

* src/bin/pg_basebackup/pg_createsubscriber.c

I think the connection string handling is not robust against funny
characters, like spaces, in database names etc.

Most SQL commands need to be amended for proper identifier or string
literal quoting and/or escaping.

In check_subscriber(): All these permissions checks seem problematic
to me.  We shouldn't reimplement our own copy of the server's
permission checks.  The server can check the permissions.  And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of.  Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever.  But I would rather just remove all this, it seems too
problematic.

In main(): The first check if the standby is running is problematic.
I think it would be better to require that the standby is initially
shut down.  Consider, the standby might be running under systemd.
This tool will try to stop it, systemd will try to restart it.  Let's
avoid these kinds of battles.  It's also safer if we don't try to
touch running servers.

The -p option (--subscriber-port) doesn't seem to do anything.  In my
testing, it always uses the compiled-in default port.

Printing all the server log lines to the terminal doesn't seem very
user-friendly.  Not sure what to do about that, short of keeping a 
pg_upgrade-style directory of log files.  But it's ugly.


Attachment

Re: speed up a logical replica setup

From
Alvaro Herrera
Date:
On 2024-Mar-18, Peter Eisentraut wrote:

> * src/bin/pg_basebackup/pg_createsubscriber.c
> 
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.

Maybe it would be easier to deal with this by passing around a struct
with keyword/value pairs that can be given to PQconnectdbParams (and
keeping dbname as a param that's passed separately, so that it can be
added when one is needed), instead of messing with the string conninfos;
then you don't have to worry about quoting.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Peter,

Thanks for giving comments. I want to reply some of them.

> > 17.
> >
> > "specifies promote"
> >
> > We can do double-quote for the word promote.
> 
> The v30 patch has <literal>promote</literal>, which I think is adequate.

Opps. Actually I did look v29 patch while firstly reviewing. Sorry for noise.

> 
> > 20.
> >
> > New options must be also documented as well. This helps not only users but
> also
> > reviewers.
> > (Sometimes we cannot identify that the implementation is intentinal or not.)
> 
> Which ones are missing?

In v29, newly added options (publication/subscription/replication-slot) was not added.
Since they have been added, please ignore.

> > 21.
> >
> > Also, not sure the specification is good. I preferred to specify them by format
> > string. Because it can reduce the number of arguments and I cannot find use
> cases
> > which user want to control the name of objects.
> >
> > However, your approach has a benefit which users can easily identify the
> generated
> > objects by pg_createsubscriber. How do other think?
> 
> I think listing them explicitly is better for the first version.  It's
> simpler to implement and more flexible.

OK.

> > 22.
> >
> > ```
> > #define    BASE_OUTPUT_DIR
>     "pg_createsubscriber_output.d"
> > ```
> >
> > No one refers the define.
> 
> This is gone in v30.

I wrote due to the above reason. Please ignore...

> 
> > 31.
> >
> > ```
> >         /* Create replication slot on publisher */
> >         if (lsn)
> >             pg_free(lsn);
> > ```
> >
> > I think allocating/freeing memory is not so efficient.
> > Can we add a flag to create_logical_replication_slot() for controlling the
> > returning value (NULL or duplicated string)? We can use the condition (i ==
> num_dbs-1)
> > as flag.
> 
> Nothing is even using the return value of
> create_logical_replication_slot().  I think this can be removed altogether.

> > 37.
> >
> > ```
> >     /* Register a function to clean up objects in case of failure */
> >     atexit(cleanup_objects_atexit);
> > ```
> >
> > Sorry if we have already discussed. I think the registration can be moved just
> > before the boot of the standby. Before that, the callback will be no-op.
> 
> But it can also stay where it is.  What is the advantage of moving it later?

I thought we could reduce the risk of bugs. Previously some bugs were reported
because the registration is too early. However, this is not a strong opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.
>

If we remove all the checks then there is a possibility that we can
fail later while creating the actual subscription. For example, if
there are not sufficient max_replication_slots, then it is bound to
fail in the later steps which would be a costlier affair because by
that time the standby would have been promoted and the user won't have
any way to move forward but to re-create standby and then use this
tool again. I think here the patch tries to mimic pg_upgrade style
checks where we do some pre-checks.

This raises a question in my mind how are we expecting users to know
all these required values and configure it properly before using this
tool?  IIUC, we are expecting that the user should figure out the
appropriate values for max_replication_slots,
max_logical_replication_workers, etc. by querying the number of
databases in the primary. Then either stop the standby to change these
parameters or use ALTER SYSTEM depending on the required parameters.
Similarly, there are some config requirements (like max_wal_senders,
max_replication_slots) for the primary which would be difficult for
users to know as these are tools are internal requirements.

The two possibilities that come to my mind are (a) pg_createsubscriber
should have some special mode/option using which the user can find out
all the required config settings, or (b) document how a user can find
the required settings. There are good chances of mistakes with option
(b).

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.
> * Vignesh reported an issue [1] related to reaching the recovery stop point
>   before the consistent state is reached. I proposed a simple patch that fixes
>   the issue.
>
> [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
>

I have added a top-up patch v30-0003. The issue in [1] still exists in
the v30 patch. I was not able to come up with an approach to handle it
in the code, so I have added it to the documentation in the warning
section. Thoughts?
I am not changing the version as I have not made any changes in
v30-0001 and v30-0002.

[1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com



Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 19.03.24 08:05, Amit Kapila wrote:
> On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> In check_subscriber(): All these permissions checks seem problematic
>> to me.  We shouldn't reimplement our own copy of the server's
>> permission checks.  The server can check the permissions.  And if the
>> permission checking in the server ever changes, then we have
>> inconsistencies to take care of.  Also, the error messages "permission
>> denied" are inappropriate, because we are not doing the actual thing.
>> Maybe we want to do a dry-run for the benefit of the user, but then we
>> should do the actual thing, like try to create a replication slot, or
>> whatever.  But I would rather just remove all this, it seems too
>> problematic.
>>
> 
> If we remove all the checks then there is a possibility that we can
> fail later while creating the actual subscription. For example, if
> there are not sufficient max_replication_slots, then it is bound to
> fail in the later steps which would be a costlier affair because by
> that time the standby would have been promoted and the user won't have
> any way to move forward but to re-create standby and then use this
> tool again. I think here the patch tries to mimic pg_upgrade style
> checks where we do some pre-checks.

I think checking for required parameter settings is fine.  My concern is 
with the code before that, that does 
pg_has_role/has_database_privilege/has_function_privilege.




Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 19.03.24 12:26, Shlok Kyal wrote:
>> I'm attaching a new version (v30) that adds:
>>
>> * 3 new options (--publication, --subscription, --replication-slot) to assign
>>    names to the objects. The --database option used to ignore duplicate names,
>>    however, since these new options rely on the number of database options to
>>    match the number of object name options, it is forbidden from now on. The
>>    duplication is also forbidden for the object names to avoid errors earlier.
>> * rewrite the paragraph related to unusuable target server after
>>    pg_createsubscriber fails.
>> * Vignesh reported an issue [1] related to reaching the recovery stop point
>>    before the consistent state is reached. I proposed a simple patch that fixes
>>    the issue.
>>
>> [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
>>
> 
> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?

Seems acceptable to me.  pg_createsubscriber will probably always have 
some restrictions and unsupported edge cases like that.  We can't 
support everything, so documenting is ok.




Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 18 Mar 2024 at 16:36, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 18.03.24 08:18, vignesh C wrote:
> > 1) Maximum size of the object name is 64, we can have a check so that
> > we don't specify more than the maximum allowed length:
> > + case 3:
> > + if (!simple_string_list_member(&opt.replslot_names, optarg))
> > + {
> > + simple_string_list_append(&opt.replslot_names, optarg);
> > + num_replslots++;
> > + }
> > + else
> > + {
> > + pg_log_error("duplicate replication slot \"%s\"", optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > If we allow something like this:
> >   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> > In this case creation of replication slot will fail:
> > pg_createsubscriber: error: could not create replication slot
> > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> > database "db2": ERROR:  replication slot
> > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> > already exists
>
> I think this is fine.  The server can check whether the names it is
> given are of the right size.  We don't need to check it again in the client.
>
> > 2) Similarly here too:
> > + case 4:
> > + if (!simple_string_list_member(&opt.sub_names, optarg))
> > + {
> > + simple_string_list_append(&opt.sub_names, optarg);
> > + num_subs++;
> > + }
> > + else
> > + {
> > + pg_log_error("duplicate subscription \"%s\"", optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > If we allow something like this:
> > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> >
> > Subscriptions will be created with the same name and later there will
> > be a problem when setting replication progress as there will be
> > multiple subscriptions with the same name.
>
> Could you clarify this problem?

In this case the subscriptions name specified is more than the allowed
name, the subscription name will be truncated and both the
subscription for db1 and db2 will have same name like below:
db2=# select subname, subdbid from pg_subscription;
                             subname                             | subdbid
-----------------------------------------------------------------+---------
 testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16384
 testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16385

Now we try to set the replication origin of the subscriptions to a
consistent lsn from the following:
+set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo,
const char *lsn)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+       Oid                     suboid;
+       char            originname[NAMEDATALEN];
+       char            lsnstr[17 + 1]; /* MAXPG_LSNLEN = 17 */
+
+       Assert(conn != NULL);
+
+       appendPQExpBuffer(str,
+                                         "SELECT oid FROM
pg_catalog.pg_subscription "
+                                         "WHERE subname = '%s'",
+                                         dbinfo->subname);
+
+       res = PQexec(conn, str->data);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not obtain subscription OID: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }
+
+       if (PQntuples(res) != 1 && !dry_run)
+       {
+               pg_log_error("could not obtain subscription OID: got
%d rows, expected %d rows",
+                                        PQntuples(res), 1);
+               disconnect_database(conn, true);
+       }

Since the subscription name is truncated, we will have multiple
records returned for the above query which results in failure with:
pg_createsubscriber: error: could not obtain subscription OID: got 2
rows, expected 1 rows
pg_createsubscriber: warning: pg_createsubscriber failed after the end
of recovery
pg_createsubscriber: hint: The target server cannot be used as a
physical replica anymore.
pg_createsubscriber: hint: You must recreate the physical replica
before continuing.

The problem with this failure is that standby has been promoted
already and we will have to re-create the physica replica again.

If you are not planning to have the checks for name length, this could
alternatively be fixed by including database id also while querying
pg_subscription like below in set_replication_progress function:
appendPQExpBuffer(str,
"SELECT oid FROM pg_catalog.pg_subscription \n"
"WHERE subname = '%s' AND subdbid = (SELECT oid FROM
pg_catalog.pg_database WHERE datname = '%s')",
dbinfo->subname,
dbinfo->dbname);

I have verified this fixes the issue.

Regards,
Vignesh



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 19.03.24 16:24, vignesh C wrote:
> The problem with this failure is that standby has been promoted
> already and we will have to re-create the physica replica again.
> 
> If you are not planning to have the checks for name length, this could
> alternatively be fixed by including database id also while querying
> pg_subscription like below in set_replication_progress function:
> appendPQExpBuffer(str,
> "SELECT oid FROM pg_catalog.pg_subscription \n"
> "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> pg_catalog.pg_database WHERE datname = '%s')",
> dbinfo->subname,
> dbinfo->dbname);

Yes, this is more correct anyway, because subscription names are 
per-database, not global.  So you should be able to make 
pg_createsubscriber use the same subscription name for each database.





Re: speed up a logical replica setup

From
Shlok Kyal
Date:
> Hi,
>
> > I'm attaching a new version (v30) that adds:
> >
> > * 3 new options (--publication, --subscription, --replication-slot) to assign
> >   names to the objects. The --database option used to ignore duplicate names,
> >   however, since these new options rely on the number of database options to
> >   match the number of object name options, it is forbidden from now on. The
> >   duplication is also forbidden for the object names to avoid errors earlier.
> > * rewrite the paragraph related to unusuable target server after
> >   pg_createsubscriber fails.
> > * Vignesh reported an issue [1] related to reaching the recovery stop point
> >   before the consistent state is reached. I proposed a simple patch that fixes
> >   the issue.
> >
> > [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
> >
>
> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?
> I am not changing the version as I have not made any changes in
> v30-0001 and v30-0002.
>
> [1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com

There was some whitespace error in the v30-0003 patch. Updated the patch.



Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Tue, Mar 19, 2024 at 8:54 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 18 Mar 2024 at 16:36, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 18.03.24 08:18, vignesh C wrote:
> > > 1) Maximum size of the object name is 64, we can have a check so that
> > > we don't specify more than the maximum allowed length:
> > > + case 3:
> > > + if (!simple_string_list_member(&opt.replslot_names, optarg))
> > > + {
> > > + simple_string_list_append(&opt.replslot_names, optarg);
> > > + num_replslots++;
> > > + }
> > > + else
> > > + {
> > > + pg_log_error("duplicate replication slot \"%s\"", optarg);
> > > + exit(1);
> > > + }
> > > + break;
> > >
> > > If we allow something like this:
> > >   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > > db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> > > In this case creation of replication slot will fail:
> > > pg_createsubscriber: error: could not create replication slot
> > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> > > database "db2": ERROR:  replication slot
> > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> > > already exists
> >
> > I think this is fine.  The server can check whether the names it is
> > given are of the right size.  We don't need to check it again in the client.
> >
> > > 2) Similarly here too:
> > > + case 4:
> > > + if (!simple_string_list_member(&opt.sub_names, optarg))
> > > + {
> > > + simple_string_list_append(&opt.sub_names, optarg);
> > > + num_subs++;
> > > + }
> > > + else
> > > + {
> > > + pg_log_error("duplicate subscription \"%s\"", optarg);
> > > + exit(1);
> > > + }
> > > + break;
> > >
> > > If we allow something like this:
> > > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > > db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> > >
> > > Subscriptions will be created with the same name and later there will
> > > be a problem when setting replication progress as there will be
> > > multiple subscriptions with the same name.
> >
> > Could you clarify this problem?
>
> In this case the subscriptions name specified is more than the allowed
> name, the subscription name will be truncated and both the
> subscription for db1 and db2 will have same name like below:
> db2=# select subname, subdbid from pg_subscription;
>                              subname                             | subdbid
> -----------------------------------------------------------------+---------
>  testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16384
>  testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16385
>
> Now we try to set the replication origin of the subscriptions to a
> consistent lsn from the following:
> +set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo,
> const char *lsn)
> +{
> +       PQExpBuffer str = createPQExpBuffer();
> +       PGresult   *res;
> +       Oid                     suboid;
> +       char            originname[NAMEDATALEN];
> +       char            lsnstr[17 + 1]; /* MAXPG_LSNLEN = 17 */
> +
> +       Assert(conn != NULL);
> +
> +       appendPQExpBuffer(str,
> +                                         "SELECT oid FROM
> pg_catalog.pg_subscription "
> +                                         "WHERE subname = '%s'",
> +                                         dbinfo->subname);
> +
> +       res = PQexec(conn, str->data);
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               pg_log_error("could not obtain subscription OID: %s",
> +                                        PQresultErrorMessage(res));
> +               disconnect_database(conn, true);
> +       }
> +
> +       if (PQntuples(res) != 1 && !dry_run)
> +       {
> +               pg_log_error("could not obtain subscription OID: got
> %d rows, expected %d rows",
> +                                        PQntuples(res), 1);
> +               disconnect_database(conn, true);
> +       }
>
> Since the subscription name is truncated, we will have multiple
> records returned for the above query which results in failure with:
> pg_createsubscriber: error: could not obtain subscription OID: got 2
> rows, expected 1 rows
> pg_createsubscriber: warning: pg_createsubscriber failed after the end
> of recovery
> pg_createsubscriber: hint: The target server cannot be used as a
> physical replica anymore.
> pg_createsubscriber: hint: You must recreate the physical replica
> before continuing.
>
> The problem with this failure is that standby has been promoted
> already and we will have to re-create the physica replica again.
>
> If you are not planning to have the checks for name length, this could
> alternatively be fixed by including database id also while querying
> pg_subscription like below in set_replication_progress function:
> appendPQExpBuffer(str,
> "SELECT oid FROM pg_catalog.pg_subscription \n"
> "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> pg_catalog.pg_database WHERE datname = '%s')",
> dbinfo->subname,
> dbinfo->dbname);

The attached patch has the changes to handle the same.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
On 16.03.24 16:42, Euler Taveira wrote:
> I'm attaching a new version (v30) that adds:

I have some review comments and attached a patch with some smaller 
fixups (mainly message wording and avoid fixed-size string buffers).

Thanks for your review. I'm attaching a new version (v32) that includes your
fixups, merges the v30-0002 into the main patch [1], addresses Hayato's review[2],
your reviews [3][4], and fixes the query for set_replication_progress() [5].

* doc/src/sgml/ref/pg_createsubscriber.sgml

I would remove the "How It Works" section.  This is not relevant to
users, and it is very detailed and will require updating whenever the
implementation changes.  It could be a source code comment instead.

It uses the same structure as pg_rewind that also describes how it works
internally. I included a separate patch that completely removes the section.

* src/bin/pg_basebackup/pg_createsubscriber.c

I think the connection string handling is not robust against funny
characters, like spaces, in database names etc.

get_base_conninfo() uses PQconninfoParse to parse the connection string. I
expect PQconnectdb to provide a suitable error message in this case. Even if it
builds keywords and values arrays, it is also susceptible to the same issue, no?

Most SQL commands need to be amended for proper identifier or string
literal quoting and/or escaping.

I completely forgot about this detail when I added the new options in v30. It is
fixed now. I also changed the tests to exercise it.

In check_subscriber(): All these permissions checks seem problematic
to me.  We shouldn't reimplement our own copy of the server's
permission checks.  The server can check the permissions.  And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of.  Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever.  But I would rather just remove all this, it seems too
problematic.

The main goal of the check_* functions are to minimize error during execution.
I removed the permission checks. The GUC checks were kept.

In main(): The first check if the standby is running is problematic.
I think it would be better to require that the standby is initially
shut down.  Consider, the standby might be running under systemd.
This tool will try to stop it, systemd will try to restart it.  Let's
avoid these kinds of battles.  It's also safer if we don't try to
touch running servers.

That's a good point. I hadn't found an excuse to simplify this but you provided
one. :) There was a worry about ignoring some command-line options that changes
GUCs if the server was started. There was also an ugly case for dry run mode
that has to start the server (if it was running) at the end. Both cases are no
longer issues. The current code provides a suitable error if the target server
is running.

The -p option (--subscriber-port) doesn't seem to do anything.  In my
testing, it always uses the compiled-in default port.

It works for me. See this snippet from the regression tests. The port (50945) is
used by pg_ctl.

# Running: pg_createsubscriber --verbose --verbose --pgdata /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 --database pg2
pg_createsubscriber: validating connection string on publisher
.
.
pg_createsubscriber: pg_ctl command is: "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp/qpngb0bPKo'"
2024-03-20 18:15:24.517 -03 [105195] LOG:  starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-03-20 18:15:24.517 -03 [105195] LOG:  listening on Unix socket "/tmp/qpngb0bPKo/.s.PGSQL.50945"

Printing all the server log lines to the terminal doesn't seem very
user-friendly.  Not sure what to do about that, short of keeping a 
pg_upgrade-style directory of log files.  But it's ugly.

I removed the previous implementation that creates a new directory and stores
the log file there. I don't like the pg_upgrade-style directory because (a) it
stores part of the server log files in another place and (b) it is another
directory to ignore if your tool handles the data directory (like a backup
tool). My last test said it prints 35 server log lines. I expect that the user
redirects the output to a file so he/she can inspect it later if required.



--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Mar 18, 2024, at 2:43 AM, Hayato Kuroda (Fujitsu) wrote:
Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

Thanks for another review. I posted a new patch (v32) that hopefully addresses
these points.

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

Fixed.

02.

"The main difference between the logical replication setup and pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

Not fixed.

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

I avoided pg_createsubscriber at the beginning because it is already
used in the previous sentence. I kept the last part of the sentence
because it is similar to one in the logical replication [1].

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

Reading the sentence again, it is not clear. When I said "most of
execution time" I was referring to the actual logical replication setup.

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

Isn't it better to leave these details to "How It Works"?

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

Fixed.

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

Fixed.

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a supplement,
not prerequisite.

Fixed.

09.

"The source server must accept connections from the target server. The source server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in recovery."

Not fixed.

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

Fixed.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."
 
Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might fail."

Fixed.

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

Not fixed. Our documentation uses "recovery process".

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

I didn't find this one. Maybe you checked a previous version.

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

Fixed.

15.

"a LSN"

s/a/an

Fixed.

16.

"of write-ahead"

s/of/of the/

Fixed.

17.

"specifies promote"

We can do double-quote for the word promote.

Why? It is referring to recovery_target_action. If you check this GUC,
you will notice that it also uses literal tag.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

Fixed.

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

Not fixed.

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

I don't know what are you referring to? If the new options are
--publication, --subscription and --replication-slot, they are
documented. Are you checking the latest patch?

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the generated
objects by pg_createsubscriber. How do other think?

I prefer explicit options. We can always expand it later if people think it is a
good idea to provide a format string.

22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.

It was removed in v30.

23.
 
```
} CreateSubscriberOptions;
...
} LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

It is a leftover when I removed the typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

It slipped my mind. Peter's fixups improves it.

23.

```
static int num_dbs = 0;
static int num_pubs = 0;
static int num_subs = 0;
static int num_replslots = 0;
```

I think the name is bit confusing. The number of generating publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the number of
specified.

My idea is num_custom_pubs or something. Thought?

What does "custom" add to make the name clear? I added comments saying
so.

24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?

I avoided turning all the options global variables. Since the cleanup routine
required the target data directory to be a global variable, I just did it and
left the others alone.

25.

```
* Replication slots, publications and subscriptions are created. Depending on
* the step it failed, it should remove the already created objects if it is
* possible (sometimes it won't work due to a connection issue).
```

I think it should be specified here that subscriptions won't be removed with the
reason. 

I rephrased this comment.

26.

```

/*
* If the server is promoted, there is no way to use the current setup
* again. Warn the user that a new replication setup should be done before
* trying again.
*/
```

Per comment 25, we can add a reference like "See comments atop the function"

It is a few lines above. I don't think you have to point it out. If you
are unsure about this decision, you should check the whole function.

27.

usage() was not updated based on recent changes.

Check v30.

28.

```
if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL)
{
if (dbname)
*dbname = pg_strdup(conn_opt->val);
continue;
}
```

There is a memory-leak if multiple dbname are specified in the conninfo.

It is not a worrying or critical memory leak.

29.

```
pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
```

No need to initialize the seed every time. Can you reuse pg_prng_state?

Sure.

30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.

Agreed.

31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
as flag.

It is not. This code path is not critical. You are suggesting to add
complexity here. Efficiency is a good goal but in this case it only adds
complexity with small return.

32.

```
/*
* Close the connection. If exit_on_error is true, it has an undesired
* condition and it should exit immediately.
*/
static void
disconnect_database(PGconn *conn, bool exit_on_error)
```

In case of disconnect_database(), the second argument should have different name.
If it is true, the process exits unconditionally.
Also, comments atop the function must be fixed.

I choose a short name. The comment seems ok to me.


33.

```
wal_level = strdup(PQgetvalue(res, 0, 0));
```

pg_strdup should be used here.

Fixed.

34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

Fixed.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.

I think you meant pg_malloc. Fixed.

36.

```
pg_free(opt.sub_port);
```

You said that the leak won't be concerned here. If so, why only 'p' has pg_free()?

Fixed.

37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.

The main reason is to catch future cases added *before* the point you
want to move this call that requires a cleanup. As you said it is a
no-op. My preference for atexit() calls is to add it as earlier as
possible to avoid leaving cases that it should trigger.

38.

```
/* Subscriber PID file */
snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);

/*
* If the standby server is running, stop it. Some parameters (that can
* only be set at server start) are informed by command-line options.
*/
if (stat(pidfile, &statbuf) == 0)
```

Hmm. pidfile is used only here, but it is declared in main(). Can it be
separated into another funtion like is_standby_started()?

It is so small that I didn't bother adding a new function for it.

39.

Or, we may able to introcue "restart_standby_if_needed" or something.

40.

```
* XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is related
with the controldata file, I think it can be located in controldata_util.c.

I added this comment here as a reference from where I extracted the
code. The referred function is from backend. Feel free to propose a
separate patch for it.

41.

You said like below in [1], but I could not find the related fix. Can you clarify?

> That's a good point. We should state in the documentation that GUCs specified in
> the command-line options are ignored during the execution.

I added a sentence for it. See "How It Works".




--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Mar 19, 2024, at 8:57 AM, Peter Eisentraut wrote:
On 19.03.24 12:26, Shlok Kyal wrote:

> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?

Seems acceptable to me.  pg_createsubscriber will probably always have 
some restrictions and unsupported edge cases like that.  We can't 
support everything, so documenting is ok.

Shlok, I'm not sure we should add a sentence about a pilot error. I added a
comment in check_subscriber that describes this situation. I think the comment
is sufficient to understand the limitation and, if it is possible in the future,
a check might be added for it. I didn't include v31-0004.


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, Mar 20, 2024, at 7:16 AM, Shubham Khanna wrote:
On Tue, Mar 19, 2024 at 8:54 PM vignesh C <vignesh21@gmail.com> wrote:
>
> If you are not planning to have the checks for name length, this could
> alternatively be fixed by including database id also while querying
> pg_subscription like below in set_replication_progress function:
> appendPQExpBuffer(str,
> "SELECT oid FROM pg_catalog.pg_subscription \n"
> "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> pg_catalog.pg_database WHERE datname = '%s')",
> dbinfo->subname,
> dbinfo->dbname);

The attached patch has the changes to handle the same.

I included a different query that does the same. See v32.


--
Euler Taveira

Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,

There is a compilation error while building postgres with the patch
due to a recent commit. I have attached a top-up patch v32-0003 to
resolve this compilation error.
I have not updated the version of the patch as I have not made any
change in v32-0001 and v32-0002 patch.

Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
vignesh C
Date:
On Thu, 21 Mar 2024 at 09:50, Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
>
> On 16.03.24 16:42, Euler Taveira wrote:
> > I'm attaching a new version (v30) that adds:
>
> I have some review comments and attached a patch with some smaller
> fixups (mainly message wording and avoid fixed-size string buffers).
>
>
> Thanks for your review. I'm attaching a new version (v32) that includes your
> fixups, merges the v30-0002 into the main patch [1], addresses Hayato's review[2],
> your reviews [3][4], and fixes the query for set_replication_progress() [5].
>
> * doc/src/sgml/ref/pg_createsubscriber.sgml
>
> I would remove the "How It Works" section.  This is not relevant to
> users, and it is very detailed and will require updating whenever the
> implementation changes.  It could be a source code comment instead.
>
>
> It uses the same structure as pg_rewind that also describes how it works
> internally. I included a separate patch that completely removes the section.
>
> * src/bin/pg_basebackup/pg_createsubscriber.c
>
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.
>
>
> get_base_conninfo() uses PQconninfoParse to parse the connection string. I
> expect PQconnectdb to provide a suitable error message in this case. Even if it
> builds keywords and values arrays, it is also susceptible to the same issue, no?
>
> Most SQL commands need to be amended for proper identifier or string
> literal quoting and/or escaping.
>
>
> I completely forgot about this detail when I added the new options in v30. It is
> fixed now. I also changed the tests to exercise it.
>
> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.
>
>
> The main goal of the check_* functions are to minimize error during execution.
> I removed the permission checks. The GUC checks were kept.
>
> In main(): The first check if the standby is running is problematic.
> I think it would be better to require that the standby is initially
> shut down.  Consider, the standby might be running under systemd.
> This tool will try to stop it, systemd will try to restart it.  Let's
> avoid these kinds of battles.  It's also safer if we don't try to
> touch running servers.
>
>
> That's a good point. I hadn't found an excuse to simplify this but you provided
> one. :) There was a worry about ignoring some command-line options that changes
> GUCs if the server was started. There was also an ugly case for dry run mode
> that has to start the server (if it was running) at the end. Both cases are no
> longer issues. The current code provides a suitable error if the target server
> is running.
>
> The -p option (--subscriber-port) doesn't seem to do anything.  In my
> testing, it always uses the compiled-in default port.
>
>
> It works for me. See this snippet from the regression tests. The port (50945) is
> used by pg_ctl.
>
> # Running: pg_createsubscriber --verbose --verbose --pgdata
/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata--publisher-server
port=50943host=/tmp/qpngb0bPKo dbname='pg1' --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1
--databasepg2 
> pg_createsubscriber: validating connection string on publisher
> .
> .
> pg_createsubscriber: pg_ctl command is: "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start
-D"/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" -s -o "-p
50945"-o "-c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp/qpngb0bPKo'" 
> 2024-03-20 18:15:24.517 -03 [105195] LOG:  starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc
(Debian10.2.1-6) 10.2.1 20210110, 64-bit 
> 2024-03-20 18:15:24.517 -03 [105195] LOG:  listening on Unix socket "/tmp/qpngb0bPKo/.s.PGSQL.50945"
>
> Printing all the server log lines to the terminal doesn't seem very
> user-friendly.  Not sure what to do about that, short of keeping a
> pg_upgrade-style directory of log files.  But it's ugly.
>
>
> I removed the previous implementation that creates a new directory and stores
> the log file there. I don't like the pg_upgrade-style directory because (a) it
> stores part of the server log files in another place and (b) it is another
> directory to ignore if your tool handles the data directory (like a backup
> tool). My last test said it prints 35 server log lines. I expect that the user
> redirects the output to a file so he/she can inspect it later if required.

Here are a few suggestions:
1) I felt displaying the server log to the console is not a good idea,
I prefer this to be logged. There were similar comments from
Kuroda-san at [1], Peter at [2]. The number of lines will increase
based on the log level set. If you don't want to use pg_upgrade style,
how about exposing the log file option and logging it to the specified
log file.

2) Currently for publication, replication-slot and subscription, we
will have to specify these options based on the number of databases.
Here if we have 100 databases we will have to specify these options
100 times, it might not be user friendly. How about something like
what Tomas had proposed at [3] and  Amit proposed at [4]. It will be
better if the user has to just specify publication, replication slot
and subscription options only one time.
+       /* Number of object names must match number of databases */
+       if (num_pubs > 0 && num_pubs != num_dbs)
+       {
+               pg_log_error("wrong number of publication names");
+               pg_log_error_hint("Number of publication names (%d)
must match number of database names (%d).",
+                                                 num_pubs, num_dbs);
+               exit(1);
+       }

3) Can we have an option similar to dry-run which will display the
configurations required in the primary and standby node something
like:
pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres"  -p 9999
-s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
--suggest-config
Suggested optimal configurations in the primary:
--------------------------------------
wallevel = logical
max_replication_slots = 3
max_wal_senders = 3
...
Suggested optimal configurations in the standby:
--------------------------------------
max_replication_slots = 3
max_wal_senders = 3
...

[1] -
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/7a970912-0b77-4942-84f7-2c9ca0bc05a5%40eisentraut.org
[3] - https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[4] - https://www.postgresql.org/message-id/CAA4eK1%2BL_J4GYES6g19xqfpEVD3K2NPCAeu3tzrJfZgu_Fk9Tw%40mail.gmail.com

Regards,
Vignesh



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 21.03.24 12:35, vignesh C wrote:
> Here are a few suggestions:
> 1) I felt displaying the server log to the console is not a good idea,
> I prefer this to be logged. There were similar comments from
> Kuroda-san at [1], Peter at [2]. The number of lines will increase
> based on the log level set. If you don't want to use pg_upgrade style,
> how about exposing the log file option and logging it to the specified
> log file.

Let's leave that for the next version.  We need to wrap things up for 
this release.

> 2) Currently for publication, replication-slot and subscription, we
> will have to specify these options based on the number of databases.
> Here if we have 100 databases we will have to specify these options
> 100 times, it might not be user friendly. How about something like
> what Tomas had proposed at [3] and  Amit proposed at [4]. It will be
> better if the user has to just specify publication, replication slot
> and subscription options only one time.

Same.  Designing, implementing, discussing, and testing this cannot be 
done in the time remaining.

> +       /* Number of object names must match number of databases */
> +       if (num_pubs > 0 && num_pubs != num_dbs)
> +       {
> +               pg_log_error("wrong number of publication names");
> +               pg_log_error_hint("Number of publication names (%d)
> must match number of database names (%d).",
> +                                                 num_pubs, num_dbs);
> +               exit(1);
> +       }
> 
> 3) Can we have an option similar to dry-run which will display the
> configurations required in the primary and standby node something
> like:
> pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres"  -p 9999
> -s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
> --suggest-config
> Suggested optimal configurations in the primary:
> --------------------------------------
> wallevel = logical
> max_replication_slots = 3
> max_wal_senders = 3
> ...
> Suggested optimal configurations in the standby:
> --------------------------------------
> max_replication_slots = 3
> max_wal_senders = 3
> ...

How would this be different from what --dry-run does now?




Re: speed up a logical replica setup

From
vignesh C
Date:
On Thu, 21 Mar 2024 at 18:02, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 21.03.24 12:35, vignesh C wrote:
> > Here are a few suggestions:
> > 1) I felt displaying the server log to the console is not a good idea,
> > I prefer this to be logged. There were similar comments from
> > Kuroda-san at [1], Peter at [2]. The number of lines will increase
> > based on the log level set. If you don't want to use pg_upgrade style,
> > how about exposing the log file option and logging it to the specified
> > log file.
>
> Let's leave that for the next version.  We need to wrap things up for
> this release.

Ok, This can be done as an improvement.

> > 2) Currently for publication, replication-slot and subscription, we
> > will have to specify these options based on the number of databases.
> > Here if we have 100 databases we will have to specify these options
> > 100 times, it might not be user friendly. How about something like
> > what Tomas had proposed at [3] and  Amit proposed at [4]. It will be
> > better if the user has to just specify publication, replication slot
> > and subscription options only one time.
>
> Same.  Designing, implementing, discussing, and testing this cannot be
> done in the time remaining.

If we commit this we might not be able to change the way the option
behaves once the customers starts using it. How about removing these
options in the first version and adding it in the next version after
more discussion.

> > 3) Can we have an option similar to dry-run which will display the
> > configurations required in the primary and standby node something
> > like:
> > pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres"  -p 9999
> > -s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
> > --suggest-config
> > Suggested optimal configurations in the primary:
> > --------------------------------------
> > wallevel = logical
> > max_replication_slots = 3
> > max_wal_senders = 3
> > ...
> > Suggested optimal configurations in the standby:
> > --------------------------------------
> > max_replication_slots = 3
> > max_wal_senders = 3
> > ...
>
> How would this be different from what --dry-run does now?

Currently dry-run will do the check and might fail on identifying a
few failures like after checking subscriber configurations. Then the
user will have to correct the configuration and re-run then fix the
next set of failures. Whereas the suggest-config will display all the
optimal configuration for both the primary and the standby in a single
shot. This is not a must in the first version, it can be done as a
subsequent enhancement.

Regards,
Vignesh



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
If we commit this we might not be able to change the way the option
behaves once the customers starts using it. How about removing these
options in the first version and adding it in the next version after
more discussion.

We don't need to redesign this one if we want to add a format string in a next
version. A long time ago, pg_dump started to accept pattern for tables without
breaking or deprecating the -t option. If you have 100 databases and you don't
want to specify the options or use a script to generate it for you, you also
have the option to let pg_createsubscriber generate the object names for you.
Per my experience, it will be a rare case.

Currently dry-run will do the check and might fail on identifying a
few failures like after checking subscriber configurations. Then the
user will have to correct the configuration and re-run then fix the
next set of failures. Whereas the suggest-config will display all the
optimal configuration for both the primary and the standby in a single
shot. This is not a must in the first version, it can be done as a
subsequent enhancement.

Do you meant publisher, right? Per order, check_subscriber is done before
check_publisher and it checks all settings on the subscriber before exiting. In
v30, I changed the way it provides the required settings. In a previous version,
it fails when it found a wrong setting; the current version, check all settings
from that server before providing a suitable error.

pg_createsubscriber: checking settings on publisher
pg_createsubscriber: primary has replication slot "physical_slot"
pg_createsubscriber: error: publisher requires wal_level >= logical
pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 remain
pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 3.
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.

If you have such an error, you will fix them all and rerun using dry run mode
again to verify everything is ok. I don't have a strong preference about it. It
can be changed easily (unifying the check functions or providing a return for
each of the check functions).


--
Euler Taveira

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
There is a compilation error while building postgres with the patch
due to a recent commit. I have attached a top-up patch v32-0003 to
resolve this compilation error.
I have not updated the version of the patch as I have not made any
change in v32-0001 and v32-0002 patch.

I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
main patch (v32-0001). This version also includes 2 new tests:

- refuse to run if the standby server is running
- refuse to run if the standby was promoted e.g. it is not in recovery

The first one exercises a recent change (standby should be stopped) and the
second one covers an important requirement.

Based on the discussion [1] about the check functions, Vignesh suggested that it
should check both server before exiting. v33-0003 implements it. I don't have a
strong preference; feel free to apply it.




--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
>
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.
>
>
> We don't need to redesign this one if we want to add a format string in a next
> version. A long time ago, pg_dump started to accept pattern for tables without
> breaking or deprecating the -t option. If you have 100 databases and you don't
> want to specify the options or use a script to generate it for you, you also
> have the option to let pg_createsubscriber generate the object names for you.
> Per my experience, it will be a rare case.
>

But, why go with some UI in the first place which we don't think is a
good one, or at least don't have a broader agreement that it is a good
one? The problem with self-generated names for users could be that
they won't be able to make much out of it. If one has to always use
those internally then probably that would be acceptable. I would
prefer what Tomas proposed a few emails ago as compared to this one as
that has fewer options to be provided by users but still, they can
later identify objects. But surely, we should discuss if you or others
have better alternatives.

The user choosing to convert a physical standby to a subscriber would
in some cases be interested in converting it for all the databases
(say for the case of upgrade [1]) and giving options for each database
would be cumbersome for her.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.
>
>
> Do you meant publisher, right? Per order, check_subscriber is done before
> check_publisher and it checks all settings on the subscriber before exiting. In
> v30, I changed the way it provides the required settings. In a previous version,
> it fails when it found a wrong setting; the current version, check all settings
> from that server before providing a suitable error.
>
> pg_createsubscriber: checking settings on publisher
> pg_createsubscriber: primary has replication slot "physical_slot"
> pg_createsubscriber: error: publisher requires wal_level >= logical
> pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 remain
> pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 3.
> pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain
> pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.
>
> If you have such an error, you will fix them all and rerun using dry run mode
> again to verify everything is ok. I don't have a strong preference about it. It
> can be changed easily (unifying the check functions or providing a return for
> each of the check functions).
>

We can unify the checks but not sure if it is worth it. I am fine
either way. It would have been better if we provided a way for a user
to know the tool's requirement rather than letting him know via errors
but I think it should be okay to extend it later as well.

[1] - This tool can be used in an upgrade where the user first
converts physical standby to subscriber to get incremental changes and
then performs an online upgrade.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Fabrízio de Royes Mello
Date:

On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.
>
> ...
>
> [1] - This tool can be used in an upgrade where the user first
> converts physical standby to subscriber to get incremental changes and
> then performs an online upgrade.
>

The first use case me and Euler discussed some time ago to upstream this tool was exactly what Amit described so IMHO we should make it easier for users to subscribe to all existing user databases.

Regards,

--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Fri, Mar 22, 2024 at 9:44 AM Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > The user choosing to convert a physical standby to a subscriber would
> > in some cases be interested in converting it for all the databases
> > (say for the case of upgrade [1]) and giving options for each database
> > would be cumbersome for her.
> >
> > ...
> >
> > [1] - This tool can be used in an upgrade where the user first
> > converts physical standby to subscriber to get incremental changes and
> > then performs an online upgrade.
> >
>
> The first use case me and Euler discussed some time ago to upstream this tool was exactly what Amit described so IMHO
weshould make it easier for users to subscribe to all existing user databases. 
>

I feel that will be a good use case for this tool especially now
(with this release) when we allow upgrade of subscriptions. In this
regard, I feel if the user doesn't specify any database it should have
subscriptions for all databases and the user should have a way to
provide publication/slot/subscription names.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
vignesh C
Date:
On Fri, 22 Mar 2024 at 09:01, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

Few comments:
1) In error case PQclear and PQfinish should be called:
+       /* Secure search_path */
+       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not clear search_path: %s",
+                                        PQresultErrorMessage(res));
+               if (exit_on_error)
+                       exit(1);
+
+               return NULL;
+       }
+       PQclear(res);

2) Call fflush here before calling system command to get proper
ordered console output:
a) Call fflush:
+               int                     rc = system(cmd_str);
+
+               if (rc == 0)
+                       pg_log_info("subscriber successfully changed
the system identifier");
+               else
+                       pg_fatal("subscriber failed to change system
identifier: exit code: %d", rc);
+       }

b) similarly here:
+       pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data);
+       rc = system(pg_ctl_cmd->data);
+       pg_ctl_status(pg_ctl_cmd->data, rc);
+       standby_running = true;

c) similarly here:
+       pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
+                                                 datadir);
+       pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd);
+       rc = system(pg_ctl_cmd);
+       pg_ctl_status(pg_ctl_cmd, rc);
+       standby_running = false;

3) Call PQClear in error cases:
a) Call PQClear
+       res = PQexec(conn, "SELECT system_identifier FROM
pg_catalog.pg_control_system()");
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not get system identifier: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }

b) similarly here
+       if (PQntuples(res) != 1)
+       {
+               pg_log_error("could not get system identifier: got %d
rows, expected %d row",
+                                        PQntuples(res), 1);
+               disconnect_database(conn, true);
+       }
+

c) similarly here
+       res = PQexec(conn,
+                                "SELECT oid FROM pg_catalog.pg_database "
+                                "WHERE datname =
pg_catalog.current_database()");
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not obtain database OID: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }
+

d) There are few more places like this.

4) Since we are using a global variable, we might be able to remove
initializing many of them.
+       /* Default settings */
+       subscriber_dir = NULL;
+       opt.config_file = NULL;
+       opt.pub_conninfo_str = NULL;
+       opt.socket_dir = NULL;
+       opt.sub_port = DEFAULT_SUB_PORT;
+       opt.sub_username = NULL;
+       opt.database_names = (SimpleStringList){0};
+       opt.recovery_timeout = 0;

Regards,
Vignesh



Re: speed up a logical replica setup

From
Shubham Khanna
Date:
On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.
>
> Based on the discussion [1] about the check functions, Vignesh suggested that it
> should check both server before exiting. v33-0003 implements it. I don't have a
> strong preference; feel free to apply it.
>
>
> [1] https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

I had run valgrind with pg_createsubscriber to see if there were any
issues. Valgrind reported the following issues:
==651272== LEAK SUMMARY:
==651272==    definitely lost: 1,319 bytes in 18 blocks
==651272==    indirectly lost: 1,280 bytes in 2 blocks
==651272==      possibly lost: 44 bytes in 3 blocks
==651272==    still reachable: 3,066 bytes in 22 blocks
==651272==         suppressed: 0 bytes in 0 blocks
==651272==
==651272== For lists of detected and suppressed errors, rerun with: -s
==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
The attached report has the details of the same.

Thanks and Regards:
Shubham Khanna.

Attachment

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Euler,

I also want to share my opinion, just in case.

> On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> >
> > If we commit this we might not be able to change the way the option
> > behaves once the customers starts using it. How about removing these
> > options in the first version and adding it in the next version after
> > more discussion.
> >
> >
> > We don't need to redesign this one if we want to add a format string in a next
> > version. A long time ago, pg_dump started to accept pattern for tables without
> > breaking or deprecating the -t option. If you have 100 databases and you don't
> > want to specify the options or use a script to generate it for you, you also
> > have the option to let pg_createsubscriber generate the object names for you.
> > Per my experience, it will be a rare case.
> >
> 
> But, why go with some UI in the first place which we don't think is a
> good one, or at least don't have a broader agreement that it is a good
> one? The problem with self-generated names for users could be that
> they won't be able to make much out of it. If one has to always use
> those internally then probably that would be acceptable. I would
> prefer what Tomas proposed a few emails ago as compared to this one as
> that has fewer options to be provided by users but still, they can
> later identify objects. But surely, we should discuss if you or others
> have better alternatives.

IIUC, added options were inspired by Tomas. And he said the limitation (pub/sub/slot
name cannot be specified) was acceptable for the first version. I agree with him.
(To be honest, I feel that options should be fewer for the first release)

> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.

+1 for the primary use case.

> > Currently dry-run will do the check and might fail on identifying a
> > few failures like after checking subscriber configurations. Then the
> > user will have to correct the configuration and re-run then fix the
> > next set of failures. Whereas the suggest-config will display all the
> > optimal configuration for both the primary and the standby in a single
> > shot. This is not a must in the first version, it can be done as a
> > subsequent enhancement.
> >
> >
> > Do you meant publisher, right? Per order, check_subscriber is done before
> > check_publisher and it checks all settings on the subscriber before exiting. In
> > v30, I changed the way it provides the required settings. In a previous version,
> > it fails when it found a wrong setting; the current version, check all settings
> > from that server before providing a suitable error.
> >
> > pg_createsubscriber: checking settings on publisher
> > pg_createsubscriber: primary has replication slot "physical_slot"
> > pg_createsubscriber: error: publisher requires wal_level >= logical
> > pg_createsubscriber: error: publisher requires 2 replication slots, but only 0
> remain
> > pg_createsubscriber: hint: Consider increasing max_replication_slots to at least
> 3.
> > pg_createsubscriber: error: publisher requires 2 wal sender processes, but only
> 0 remain
> > pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.
> >
> > If you have such an error, you will fix them all and rerun using dry run mode
> > again to verify everything is ok. I don't have a strong preference about it. It
> > can be changed easily (unifying the check functions or providing a return for
> > each of the check functions).
> >
> 
> We can unify the checks but not sure if it is worth it. I am fine
> either way. It would have been better if we provided a way for a user
> to know the tool's requirement rather than letting him know via errors
> but I think it should be okay to extend it later as well.

Both ways are OK, but I prefer to unify checks a bit. The number of working modes
in the same executables should be reduced as much as possible.

Also, I feel the current specification is acceptable. pg_upgrade checks one by
one and exits soon in bad cases. If both old and new clusters have issues, the
first run only reports the old one's issues. After DBA fixes and runs again,
issues on the new cluster are output.

[1]: https://www.postgresql.org/message-id/8d52c226-7e34-44f7-a919-759bf8d81541%40enterprisedb.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
> IIUC, added options were inspired by Tomas. And he said the limitation
> (pub/sub/slot
> name cannot be specified) was acceptable for the first version. I agree with him.
> (To be honest, I feel that options should be fewer for the first release)

Just to confirm - I don't think it is not completely needed. If we can agree a specification
in sometime, it's OK for me to add them. If you ask me, I still prefer Tomas's approach.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

> I had run valgrind with pg_createsubscriber to see if there were any
> issues.

Thanks for running the tool!

> Valgrind reported the following issues:
> ==651272== LEAK SUMMARY:
> ==651272==    definitely lost: 1,319 bytes in 18 blocks
> ==651272==    indirectly lost: 1,280 bytes in 2 blocks
> ==651272==      possibly lost: 44 bytes in 3 blocks
> ==651272==    still reachable: 3,066 bytes in 22 blocks
> ==651272==         suppressed: 0 bytes in 0 blocks
> ==651272==
> ==651272== For lists of detected and suppressed errors, rerun with: -s
> ==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from
> 0)
> The attached report has the details of the same.

I read the report. I'm not sure all entries must be fixed. Based on other client
tools, old discussions [1], and current codes, I thought we could determine the below
rule:

* For global variables (and their attributes), no need to free the allocated memory.
* For local variables (and their attributes) in main(), no need to free the allocated memory.
* For local variables in other functions, they should be free'd at the end of the function.

Per above rule and your report, I made a top-up patch which adds pg_free() and
destroyPQExpBuffer() several places. How do you think?

[1]: https://www.postgresql.org/message-id/40595e73-c7e1-463a-b8be-49792e870007%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 22.03.24 04:31, Euler Taveira wrote:
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>> There is a compilation error while building postgres with the patch
>> due to a recent commit. I have attached a top-up patch v32-0003 to
>> resolve this compilation error.
>> I have not updated the version of the patch as I have not made any
>> change in v32-0001 and v32-0002 patch.
> 
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) 
> into the
> main patch (v32-0001). This version also includes 2 new tests:
> 
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
> 
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

I have committed your version v33.  I did another pass over the 
identifier and literal quoting.  I added quoting for replication slot 
names, for example, even though they can only contain a restricted set 
of characters, but it felt better to be defensive there.

I'm happy to entertain follow-up patches on some of the details like 
option naming that were still being discussed.  I just wanted to get the 
main functionality in in good time.  We can fine-tune the rest over the 
next few weeks.

> Based on the discussion [1] about the check functions, Vignesh suggested 
> that it
> should check both server before exiting. v33-0003 implements it. I don't 
> have a
> strong preference; feel free to apply it.

I haven't done anything about this.




Re: speed up a logical replica setup

From
Bharath Rupireddy
Date:
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have committed your version v33.

Looks like BF animals aren't happy, please check -
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: speed up a logical replica setup

From
Peter Eisentraut
Date:
On 25.03.24 13:36, Bharath Rupireddy wrote:
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> I have committed your version v33.
> 
> Looks like BF animals aren't happy, please check -
> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.

Looks like sanitizer failures.  There were a few messages about that 
recently, but those were all just about freeing memory after use, which 
we don't necessarily require for client programs.  So maybe something else.




Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Mar 25, 2024, at 8:55 AM, Peter Eisentraut wrote:
On 22.03.24 04:31, Euler Taveira wrote:
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>> There is a compilation error while building postgres with the patch
>> due to a recent commit. I have attached a top-up patch v32-0003 to
>> resolve this compilation error.
>> I have not updated the version of the patch as I have not made any
>> change in v32-0001 and v32-0002 patch.

> I'm attaching a new version (v33) to incorporate this fix (v32-0003) 
> into the
> main patch (v32-0001). This version also includes 2 new tests:

> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery

> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

I have committed your version v33.  I did another pass over the 
identifier and literal quoting.  I added quoting for replication slot 
names, for example, even though they can only contain a restricted set 
of characters, but it felt better to be defensive there.
Thanks.

I'm happy to entertain follow-up patches on some of the details like 
option naming that were still being discussed.  I just wanted to get the 
main functionality in in good time.  We can fine-tune the rest over the 
next few weeks.
Agree. Let's continue the discussion about the details.

> Based on the discussion [1] about the check functions, Vignesh suggested 
> that it
> should check both server before exiting. v33-0003 implements it. I don't 
> have a
> strong preference; feel free to apply it.

I haven't done anything about this.
... including this one.


--
Euler Taveira

RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Bharath, Peter,

> Looks like BF animals aren't happy, please check -
> > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> 
> Looks like sanitizer failures.  There were a few messages about that
> recently, but those were all just about freeing memory after use, which
> we don't necessarily require for client programs.  So maybe something else.

It seems that there are several time of failures, [1] and [2].

## Analysis for failure 1

The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
returns true.

According to the output [1], it seems that the tool failed at wait_for_end_recovery()
with the message "standby server disconnected from the primary". Also, lines
"redo done at..." and "terminating walreceiver process due to administrator command"
meant that walreceiver was requested to shut down by XLogShutdownWalRcv().

According to the source, we confirm that walreceiver is shut down in
StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
at the latter part of StartupXLOG().

So, if there is a delay between FinishWalRecovery() and change the state, the check
in wait_for_end_recovery() would be failed during the time. Since we allow to miss
the walreceiver 10 times and it is checked once per second, the failure occurs if
the time lag is longer than 10 seconds.

I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
but it's not a fundamental solution.

## Analysis for failure 2

According to [2], the physical replication slot which is specified as primary_slot_name
was not used by the walsender process. At that time walsender has not existed.

```
...
pg_createsubscriber: publisher: current wal senders: 0
pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
...
```

Currently standby must be stopped before the command and current code does not
block the flow to ensure the replication is started. So there is a possibility
that the checking is run before walsender is launched.

One possible approach is to wait until the replication starts. Alternative one is
to ease the condition.

How do you think?

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-25%2013%3A03%3A07
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-03-25%2013%3A53%3A58

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 

Re: speed up a logical replica setup

From
vignesh C
Date:
On Mon, 25 Mar 2024 at 21:36, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Bharath, Peter,
>
> > Looks like BF animals aren't happy, please check -
> > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> >
> > Looks like sanitizer failures.  There were a few messages about that
> > recently, but those were all just about freeing memory after use, which
> > we don't necessarily require for client programs.  So maybe something else.
>
> It seems that there are several time of failures, [1] and [2].
>
> ## Analysis for failure 1
>
> The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
> returns true.
>
> According to the output [1], it seems that the tool failed at wait_for_end_recovery()
> with the message "standby server disconnected from the primary". Also, lines
> "redo done at..." and "terminating walreceiver process due to administrator command"
> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>
> According to the source, we confirm that walreceiver is shut down in
> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
> at the latter part of StartupXLOG().
>
> So, if there is a delay between FinishWalRecovery() and change the state, the check
> in wait_for_end_recovery() would be failed during the time. Since we allow to miss
> the walreceiver 10 times and it is checked once per second, the failure occurs if
> the time lag is longer than 10 seconds.
>
> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
> but it's not a fundamental solution.

I agree with your analysis, another way to fix could be to remove the
following check as increasing the count might still have the race
condition issue:
/*
* If it is still in recovery, make sure the target server is
* connected to the primary so it can receive the required WAL to
* finish the recovery process. If it is disconnected try
* NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
*/
res = PQexec(conn,
"SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");

I'm not sure whether we should worry about the condition where
recovery is not done and pg_stat_wal_receiver is exited as we have the
following sanity check in check_subscriber before we wait for recovery
to be finished:
/* The target server must be a standby */
if (!server_is_in_recovery(conn))
{
pg_log_error("target server must be a standby");
disconnect_database(conn, true);
}

Regards,
Vignesh



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have committed your version v33.  I did another pass over the
> identifier and literal quoting.  I added quoting for replication slot
> names, for example, even though they can only contain a restricted set
> of characters, but it felt better to be defensive there.
>
> I'm happy to entertain follow-up patches on some of the details like
> option naming that were still being discussed.  I just wanted to get the
> main functionality in in good time.  We can fine-tune the rest over the
> next few weeks.
>

I was looking at prior discussions on this topic to see if there are
any other open design points apart from this and noticed that the
points raised/discussed in the email [1] are also not addressed. IIRC,
the key point we discussed was that after promotion, the existing
replication objects should be removed (either optionally or always),
otherwise, it can lead to a new subscriber not being able to restart
or getting some unwarranted data.

[1] - https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
## Analysis for failure 1

The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
returns true.

According to the output [1], it seems that the tool failed at wait_for_end_recovery()
with the message "standby server disconnected from the primary". Also, lines
"redo done at..." and "terminating walreceiver process due to administrator command"
meant that walreceiver was requested to shut down by XLogShutdownWalRcv().

According to the source, we confirm that walreceiver is shut down in
StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
at the latter part of StartupXLOG().

So, if there is a delay between FinishWalRecovery() and change the state, the check
in wait_for_end_recovery() would be failed during the time. Since we allow to miss
the walreceiver 10 times and it is checked once per second, the failure occurs if
the time lag is longer than 10 seconds.

I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
but it's not a fundamental solution.

I was expecting that slow hosts might have issues in wait_for_end_recovery().
As you said it took a lot of steps between FinishWalRecovery() (where
walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to
RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *
WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision
because it already finished the promotion and it is just doing the final
preparation for the host to become a primary.

        /*   
         * If it is still in recovery, make sure the target server is
         * connected to the primary so it can receive the required WAL to
         * finish the recovery process. If it is disconnected try
         * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
         */
        res = PQexec(conn,
                     "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
        if (PQntuples(res) == 0)
        {
            if (++count > NUM_CONN_ATTEMPTS)
            {
                stop_standby_server(subscriber_dir);
                pg_log_error("standby server disconnected from the primary");
                break;
            }    
        }
        else
            count = 0;          /* reset counter if it connects again */

This code was add to defend against the death/crash of the target server. There
are at least 3 options:

(1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this constant
and I decided to use 10 seconds because even in some slow hosts, this time
wasn't reached during my tests. It seems I forgot to test the combination of slow
host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60
seconds as default wait. Maybe that's a reasonable value. I checked the
004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s
(culicidae), 18.31s (calliphoridae) and 27.52s (olingo).

(2) check if the primary is not running when walreceiver is not available on the
target server. Increase the connection attempts iif the primary is not running.
Hence, the described case doesn't cause an increment on the count variable.

(3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check
protection against the death/crash target server. I explained in a previous
message that timeout may occur in cases that WAL replay to reach consistent
state takes more than recovery-timeout seconds.

Option (1) is the easiest fix, however, we can have the same issue again if a
slow host decides to be even slower, hence, we have to adjust this value again.
Option (2) interprets the walreceiver absence as a recovery end and if the
primary server is running it can indicate that the target server is in the
imminence of the recovery end. Option (3) is not as resilient as the other
options.

The first patch implements a combination of (1) and (2).

## Analysis for failure 2

According to [2], the physical replication slot which is specified as primary_slot_name
was not used by the walsender process. At that time walsender has not existed.

```
...
pg_createsubscriber: publisher: current wal senders: 0
pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
...
```

Currently standby must be stopped before the command and current code does not
block the flow to ensure the replication is started. So there is a possibility
that the checking is run before walsender is launched.

One possible approach is to wait until the replication starts. Alternative one is
to ease the condition.

That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
NUM_ATTEMPTS in the first patch). See second patch.


--
Euler Taveira

Attachment

Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have committed your version v33.  I did another pass over the
> identifier and literal quoting.  I added quoting for replication slot
> names, for example, even though they can only contain a restricted set
> of characters, but it felt better to be defensive there.
>
> I'm happy to entertain follow-up patches on some of the details like
> option naming that were still being discussed.  I just wanted to get the
> main functionality in in good time.  We can fine-tune the rest over the
> next few weeks.
>

I was looking at prior discussions on this topic to see if there are
any other open design points apart from this and noticed that the
points raised/discussed in the email [1] are also not addressed. IIRC,
the key point we discussed was that after promotion, the existing
replication objects should be removed (either optionally or always),
otherwise, it can lead to a new subscriber not being able to restart
or getting some unwarranted data.

See setup_subscriber.

        /*
         * Since the publication was created before the consistent LSN, it is
         * available on the subscriber when the physical replica is promoted.
         * Remove publications from the subscriber because it has no use.
         */
        drop_publication(conn, &dbinfo[i]);


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Tue, Mar 26, 2024 at 8:27 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
>
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > I have committed your version v33.  I did another pass over the
> > identifier and literal quoting.  I added quoting for replication slot
> > names, for example, even though they can only contain a restricted set
> > of characters, but it felt better to be defensive there.
> >
> > I'm happy to entertain follow-up patches on some of the details like
> > option naming that were still being discussed.  I just wanted to get the
> > main functionality in in good time.  We can fine-tune the rest over the
> > next few weeks.
> >
>
> I was looking at prior discussions on this topic to see if there are
> any other open design points apart from this and noticed that the
> points raised/discussed in the email [1] are also not addressed. IIRC,
> the key point we discussed was that after promotion, the existing
> replication objects should be removed (either optionally or always),
> otherwise, it can lead to a new subscriber not being able to restart
> or getting some unwarranted data.
>
>
> See setup_subscriber.
>
>         /*
>          * Since the publication was created before the consistent LSN, it is
>          * available on the subscriber when the physical replica is promoted.
>          * Remove publications from the subscriber because it has no use.
>          */
>         drop_publication(conn, &dbinfo[I]);
>

This only drops the publications created by this tool, not the
pre-existing ones that we discussed in the link provided.

--
With Regards,
Amit Kapila.



RE: speed up a logical replica setup

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Euler,

> 
> This only drops the publications created by this tool, not the
> pre-existing ones that we discussed in the link provided.

Another concern around here is the case which primary subscribes changes from others.
After the conversion, new subscriber also tries to connect to another publisher as
well - this may lead conflicts. This causes because both launcher/workers start
after recovery finishes. So, based on the Ashutosh's point, should we remove
such replication objects?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

From
Tomas Vondra
Date:
On 3/26/24 03:53, Euler Taveira wrote:
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>> ## Analysis for failure 1
>>
>> The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
>> returns true.
>>
>> According to the output [1], it seems that the tool failed at wait_for_end_recovery()
>> with the message "standby server disconnected from the primary". Also, lines
>> "redo done at..." and "terminating walreceiver process due to administrator command"
>> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>>
>> According to the source, we confirm that walreceiver is shut down in
>> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
>> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
>> at the latter part of StartupXLOG().
>>
>> So, if there is a delay between FinishWalRecovery() and change the state, the check
>> in wait_for_end_recovery() would be failed during the time. Since we allow to miss
>> the walreceiver 10 times and it is checked once per second, the failure occurs if
>> the time lag is longer than 10 seconds.
>>
>> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
>> but it's not a fundamental solution.
> 
> I was expecting that slow hosts might have issues in wait_for_end_recovery().
> As you said it took a lot of steps between FinishWalRecovery() (where
> walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to
> RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *
> WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision
> because it already finished the promotion and it is just doing the final
> preparation for the host to become a primary.
> 
>         /*   
>          * If it is still in recovery, make sure the target server is
>          * connected to the primary so it can receive the required WAL to
>          * finish the recovery process. If it is disconnected try
>          * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
>          */
>         res = PQexec(conn,
>                      "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
>         if (PQntuples(res) == 0)
>         {
>             if (++count > NUM_CONN_ATTEMPTS)
>             {
>                 stop_standby_server(subscriber_dir);
>                 pg_log_error("standby server disconnected from the primary");
>                 break;
>             }    
>         }
>         else
>             count = 0;          /* reset counter if it connects again */
> 
> This code was add to defend against the death/crash of the target server. There
> are at least 3 options:
> 
> (1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this constant
> and I decided to use 10 seconds because even in some slow hosts, this time
> wasn't reached during my tests. It seems I forgot to test the combination of slow
> host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60
> seconds as default wait. Maybe that's a reasonable value. I checked the
> 004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s
> (culicidae), 18.31s (calliphoridae) and 27.52s (olingo).
> > (2) check if the primary is not running when walreceiver is not
available on the
> target server. Increase the connection attempts iif the primary is not running.
> Hence, the described case doesn't cause an increment on the count variable.
> 
> (3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check
> protection against the death/crash target server. I explained in a previous
> message that timeout may occur in cases that WAL replay to reach consistent
> state takes more than recovery-timeout seconds.
> 
> Option (1) is the easiest fix, however, we can have the same issue again if a
> slow host decides to be even slower, hence, we have to adjust this value again.
> Option (2) interprets the walreceiver absence as a recovery end and if the
> primary server is running it can indicate that the target server is in the
> imminence of the recovery end. Option (3) is not as resilient as the other
> options.
> 
> The first patch implements a combination of (1) and (2).
> 
>> ## Analysis for failure 2
>>
>> According to [2], the physical replication slot which is specified as primary_slot_name
>> was not used by the walsender process. At that time walsender has not existed.
>>
>> ```
>> ...
>> pg_createsubscriber: publisher: current wal senders: 0
>> pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
>> pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
>> ...
>> ```
>>
>> Currently standby must be stopped before the command and current code does not
>> block the flow to ensure the replication is started. So there is a possibility
>> that the checking is run before walsender is launched.
>>
>> One possible approach is to wait until the replication starts. Alternative one is
>> to ease the condition.
> 
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
> 

Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
waited long enough?

IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
pg_createsubscriber, and that should do the trick.

Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
things like ubsan/valgrind already use to deal with exactly this kind of
timeout problem.

Or is there a deeper problem with deciding if the system is in recovery?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
waited long enough?

It was an attempt to decoupled a connection failure (that keeps streaming the
WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
is gone during the standby recovery process, there is a way to bail out. The
recovery-timeout is 0 (infinite) by default so you have an infinite wait without
this check. The idea behind this implementation is to avoid exiting in this
critical code path. If it times out here you might have to rebuild the standby
and start again. Amit suggested [1] that we use a value as recovery-timeout but
how high is a good value? I've already saw some long recovery process using
pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
worried about a small percentage of cases and we should use 1h as default, for
example. It would reduce the complexity since the recovery process lacks some
progress indicators (LSN is not sufficient in this case and there isn't a
function to provide the current state -- stop applying WAL, reach target, new
timeline, etc).

If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
by default otherwise we will have some reports saying the tool is hanging when
in reality the primary has gone and WAL should be streamed.

IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
pg_createsubscriber, and that should do the trick.

That's a good idea. Tests are not exercising the recovery-timeout option.

Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
things like ubsan/valgrind already use to deal with exactly this kind of
timeout problem.

Or is there a deeper problem with deciding if the system is in recovery?

As I said with some recovery progress indicators it would be easier to make some
decisions like wait a few seconds because the WAL has already been applied and
it is creating a new timeline. The recovery timeout decision is a shot in the
dark because we might be aborting pg_createsubscriber when the target server is
about to set RECOVERY_STATE_DONE.


--
Euler Taveira

Re: speed up a logical replica setup

From
Tomas Vondra
Date:
On 3/26/24 21:17, Euler Taveira wrote:
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
>> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
>> waited long enough?
> 
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out. The
> recovery-timeout is 0 (infinite) by default so you have an infinite wait without
> this check. The idea behind this implementation is to avoid exiting in this
> critical code path. If it times out here you might have to rebuild the standby
> and start again.

- This seems like something that should definitely be documented in the
comment before wait_for_end_recovery(). At the moment it only talks
about timeout, and nothing about NUM_CONN_ATTEMPTS.

- The NUM_CONN_ATTEMPTS name seems rather misleading, considering it
does not really count connection attempts, but number of times we have
not seen 1 in pg_catalog.pg_stat_wal_receiver.

- Not sure I follow the logic - it tries to avoid exiting by setting
infinite timeout, but it still exists based on NUM_CONN_ATTEMPTS. Isn't
that somewhat contradictory?

- Isn't the NUM_CONN_ATTEMPTS actually making it more fragile, i.e. more
likely to exit? For example, what if there's a short networking hiccup,
so that the standby can't connect to the primary.

- It seems a bit strange that even with the recovery timeout set, having
the limit of 10 "connection attempts" effectively establishes a separate
hard-coded limit of 10 seconds. Seems a bit surprising if I set recovery
limit to 1 minute, and it just dies after 10 seconds.

> Amit suggested [1] that we use a value as recovery-timeout but
> how high is a good value? I've already saw some long recovery process using
> pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
> worried about a small percentage of cases and we should use 1h as default, for
> example. It would reduce the complexity since the recovery process lacks some
> progress indicators (LSN is not sufficient in this case and there isn't a
> function to provide the current state -- stop applying WAL, reach target, new
> timeline, etc).
> 
> If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
> by default otherwise we will have some reports saying the tool is hanging when
> in reality the primary has gone and WAL should be streamed.
> 

I don't think there's a default timeout value that would work for
everyone. Either it's going to be too short for some cases, or it'll
take too long for some other cases.

I think there are two obvious default values for the timeout - infinity,
and 60 seconds, which is the default we use for other CLI tools (like
pg_ctl and so on). Considering the negative impact of exiting, I'd say
it's better to default to infinity. It's always possible to Ctrl-C or
terminate the process in some other way, if needed.

As for people complaining about infinite recovery - perhaps it'd be
sufficient to mention this in the messages printed by the tool, to make
it clearer. Or maybe even print something in the loop, because right now
it's entirely silent so it's easy to believe it's stuck. Perhaps not on
every loop, but at least in verbose mode it should print something.

>> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
>> pg_createsubscriber, and that should do the trick.
> 
> That's a good idea. Tests are not exercising the recovery-timeout option.
> 
>> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
>> things like ubsan/valgrind already use to deal with exactly this kind of
>> timeout problem.
>>
>> Or is there a deeper problem with deciding if the system is in recovery?
> 
> As I said with some recovery progress indicators it would be easier to make some
> decisions like wait a few seconds because the WAL has already been applied and
> it is creating a new timeline. The recovery timeout decision is a shot in the
> dark because we might be aborting pg_createsubscriber when the target server is
> about to set RECOVERY_STATE_DONE.
> 

Isn't it enough to check data in pg_stat_replication on the primary?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Tue, Mar 26, 2024 at 8:24 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>
> The first patch implements a combination of (1) and (2).
>
> ## Analysis for failure 2
>
> According to [2], the physical replication slot which is specified as primary_slot_name
> was not used by the walsender process. At that time walsender has not existed.
>
> ```
> ...
> pg_createsubscriber: publisher: current wal senders: 0
> pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
> pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
> ...
> ```
>
> Currently standby must be stopped before the command and current code does not
> block the flow to ensure the replication is started. So there is a possibility
> that the checking is run before walsender is launched.
>
> One possible approach is to wait until the replication starts. Alternative one is
> to ease the condition.
>
>
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
>

How can we guarantee that the slot can become active after these many
attempts? It still could be possible that on some slow machines it
didn't get activated even after NUM_ATTEMPTS. BTW, in the first place,
why do we need to ensure that the 'primary_slot_name' is active on the
primary?

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

How would you check it? WAL file? During recovery, you are not allowed to use
pg_current_wal_lsn.

Tomas suggested to me off-list that we should adopt a simple solution in
wait_for_end_recovery: wait for recovery_timeout without additional checks
(which means remove the pg_stat_wal_receiver logic).  When we have additional
information that we can reliably use in this function, we can add it. Hence, it
is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
>
> On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> >
> > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> > Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> > waited long enough?
> >
> >
> > It was an attempt to decoupled a connection failure (that keeps streaming the
> > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> > is gone during the standby recovery process, there is a way to bail out.
> >
>
> I think we don't need to check primary if the WAL corresponding to
> consistent_lsn is already present on the standby. Shouldn't we first
> check that? Once we ensure that the required WAL is copied, just
> checking server_is_in_recovery() should be sufficient. I feel that
> will be a direct way of ensuring what is required rather than
> indirectly verifying the same (by checking pg_stat_wal_receiver) as we
> are doing currently.
>
>
> How would you check it? WAL file? During recovery, you are not allowed to use
> pg_current_wal_lsn.
>

How about pg_last_wal_receive_lsn()?

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
>

I was trying to test this utility when 'sync_replication_slots' is on
and it gets in an ERROR loop [1] and never finishes. Please find the
postgresql.auto used on the standby attached. I think if the standby
has enabled sync_slots, you need to pass dbname in
GenerateRecoveryConfig(). I couldn't test it further but I wonder if
there are already synced slots on the standby (either due to
'sync_replication_slots' or users have used
pg_sync_replication_slots() before invoking pg_createsubscriber),
those would be retained as it is on new subscriber and lead to
unnecessary WAL retention and dead rows.

[1]
2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
requires dbname to be specified in primary_conninfo

--
With Regards,
Amit Kapila.

Attachment

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
> >
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig().

The other possibility is that when we start standby from
pg_createsubscriber, we specifically set 'sync_replication_slots' as
false.

>
> I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>

This still needs some handling.

BTW, I don't see the use of following messages in --dry-run mode or
rather they could be misleading:
pg_createsubscriber: hint: If pg_createsubscriber fails after this
point, you must recreate the physical replica before continuing.
...
...
pg_createsubscriber: setting the replication progress (node name
"pg_0" ; LSN 0/0) on database "postgres"

Similarly, we should think if below messages are useful in --dry-run mode:
pg_createsubscriber: dropping publication
"pg_createsubscriber_5_887f7991" on database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_887f7991" on database "postgres"
...
pg_createsubscriber: enabling subscription
"pg_createsubscriber_5_887f7991" on database "postgres"

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Thomas Munro
Date:
040_pg_createsubscriber.pl seems to be failing occasionally on
culicidae near "--dry-run on node S".  I couldn't immediately see why.
That animal is using EXEC_BACKEND and I also saw a one-off failure a
bit like that on my own local Linux + EXEC_BACKEND test run
(sorry I didn't keep the details around).  Coincidence?



Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Sun, May 19, 2024 at 4:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> 040_pg_createsubscriber.pl seems to be failing occasionally on
> culicidae near "--dry-run on node S".  I couldn't immediately see why.
> That animal is using EXEC_BACKEND and I also saw a one-off failure a
> bit like that on my own local Linux + EXEC_BACKEND test run
> (sorry I didn't keep the details around).  Coincidence?
>

AFAICS, this is the same as one of the two BF failures being discussed
in this thread.

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
Shlok Kyal
Date:
Hi,
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>
> [1]
> 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> requires dbname to be specified in primary_conninfo

Hi,

I tested the scenario posted by Amit in [1], in which retaining synced
slots lead to unnecessary WAL retention and ERROR. This is raised as
the second open point in [2].
The steps to reproduce the issue:
(1) Setup physical replication with sync slot feature turned on by
setting sync_replication_slots = 'true' or using
pg_sync_replication_slots() on the standby node.
For physical replication setup, run pg_basebackup with -R  and -d option.
(2) Create a logical replication slot on primary node with failover
option as true. A corresponding slot is created on standby as part of
sync slot feature.
(3) Run pg_createsubscriber on standby node.
(4) On Checking for the replication slot on standby node, I noticed
that the logical slots created in step 2 are retained.
 I have attached the script to reproduce the issue.

I and Kuroda-san worked to resolve open points. Here are patches to
solve the second and third point in [2].
Patches proposed by Euler are also attached just in case, but they
were not modified.

v2-0001: not changed
v2-0002: not changed
v2-0003: ensures the slot sync is disabled during the conversion. This
resolves the second point.
v2-0004: drops sync slots which may be retained after running. This
resolves the second point.
v2-0005: removes misleading output messages in dry-run. This resolves
the third point.

[1]: https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1J22UEfrqx222h5j9DQ7nxGrTbAa_BC%2B%3DmQXdXs-RCsew%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Mon, May 20, 2024 at 4:30 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > I was trying to test this utility when 'sync_replication_slots' is on
> > and it gets in an ERROR loop [1] and never finishes. Please find the
> > postgresql.auto used on the standby attached. I think if the standby
> > has enabled sync_slots, you need to pass dbname in
> > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > there are already synced slots on the standby (either due to
> > 'sync_replication_slots' or users have used
> > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > those would be retained as it is on new subscriber and lead to
> > unnecessary WAL retention and dead rows.
> >
> > [1]
> > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > requires dbname to be specified in primary_conninfo
>
> Hi,
>
> I tested the scenario posted by Amit in [1], in which retaining synced
> slots lead to unnecessary WAL retention and ERROR. This is raised as
> the second open point in [2].
> The steps to reproduce the issue:
> (1) Setup physical replication with sync slot feature turned on by
> setting sync_replication_slots = 'true' or using
> pg_sync_replication_slots() on the standby node.
> For physical replication setup, run pg_basebackup with -R  and -d option.
> (2) Create a logical replication slot on primary node with failover
> option as true. A corresponding slot is created on standby as part of
> sync slot feature.
> (3) Run pg_createsubscriber on standby node.
> (4) On Checking for the replication slot on standby node, I noticed
> that the logical slots created in step 2 are retained.
>  I have attached the script to reproduce the issue.
>
> I and Kuroda-san worked to resolve open points. Here are patches to
> solve the second and third point in [2].
> Patches proposed by Euler are also attached just in case, but they
> were not modified.
>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

[1] - https://www.postgresql.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: speed up a logical replica setup

From
"Euler Taveira"
Date:
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.


--
Euler Taveira

Re: speed up a logical replica setup

From
Amit Kapila
Date:
On Wed, May 22, 2024 at 8:46 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
>
> > v2-0002: not changed
> >
>
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.
>
>
> Following the same line that simplifies the code, we can: (a) add a loop in
> check_subscriber() that waits until walreceiver is available on subscriber or
> (b) use a timeout. The main advantage of (a) is that the primary slot is already
> available but I'm afraid we need a escape mechanism for the loop (timeout?).
>

Sorry, it is not clear to me why we need any additional loop in
check_subscriber(), aren't we speaking about the problem in
check_publisher() function?

Why in the first place do we need to ensure that primary_slot_name is
active on the primary? You mentioned something related to WAL
retention but I don't know how that is related to this tool's
functionality. If at all, we are bothered about WAL retention on the
primary that should be the WAL corresponding to consistent_lsn
computed by setup_publisher() but this check doesn't seem to ensure
that.

--
With Regards,
Amit Kapila.