Home > mailing lists

Thread: speed up a logical replica setup

speed up a logical replica setup

From

"Euler Taveira"

Date:

21 February 2022, 12:09:12

Hi,

Logical replication has been used to migration with minimal downtime. However,

if you are dealing with a big database, the amount of required resources (disk

-- due to WAL retention) increases as the backlog (WAL) increases. Unless you

have a generous amount of resources and can wait for long period of time until

the new replica catches up, creating a logical replica is impracticable on

large databases.

The general idea is to create and convert a physical replica or a base backup

(archived WAL files available) into a logical replica. The initial data copy

and catchup tends to be faster on a physical replica. This technique has been

successfully used in pglogical_create_subscriber [1].

A new tool called pg_subscriber does this conversion and is tightly integrated

with Postgres.

DESIGN

The conversion requires 8 steps.

1. Check if the target data directory has the same system identifier than the

source data directory.

2. Stop the target server if it is running as a standby server. (Modify

recovery parameters requires a restart.)

3. Create one replication slot per specified database on the source server. One

additional replication slot is created at the end to get the consistent LSN

(This consistent LSN will be used as (a) a stopping point for the recovery

process and (b) a starting point for the subscriptions).

4. Write recovery parameters into the target data directory and start the

target server (Wait until the target server is promoted).

5. Create one publication (FOR ALL TABLES) per specified database on the source

server.

6. Create one subscription per specified database on the target server (Use

replication slot and publication created in a previous step. Don't enable the

subscriptions yet).

7. Sets the replication progress to the consistent LSN that was got in a

previous step.

8. Enable the subscription for each specified database on the target server.

This tool does not take a base backup. It can certainly be included later.

There is already a tool do it: pg_basebackup.

There is a --subscriber-conninfo option to inform the subscriber connection

string, however, we could remove it since this tool runs on the subscriber and

we can build a connection string.

NAME

I'm not sure about the proposed name. I came up with this one because it is not

so long. The last added tools uses pg_ prefix, verb (action) and object.

pg_initsubscriber and pg_createsubscriber are names that I thought but I'm not

excited about it.

DOCUMENTATION

It is available and describes this tool.

TESTS

Basic tests are included. It requires some tests to exercise this tool.

Comments?

[1] https://github.com/2ndQuadrant/pglogical

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

Andres Freund

Date:

21 February 2022, 23:28:49

Hi,

On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.

Indeed.


> DESIGN
> 
> The conversion requires 8 steps.
> 
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.

I think the system identifier should also be changed, otherwise you can way
too easily get into situations trying to apply WAL from different systems to
each other. Not going to end well, obviously.


> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.

It would make sense to allow to call pg_basebackup from the new tool. Perhaps
with a --pg-basebackup-parameters or such.



Greetings,

Andres Freund

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

22 February 2022, 00:35:17

On Mon, Feb 21, 2022, at 8:28 PM, Andres Freund wrote:

I think the system identifier should also be changed, otherwise you can way
too easily get into situations trying to apply WAL from different systems to
each other. Not going to end well, obviously.

Good point.

> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.

It would make sense to allow to call pg_basebackup from the new tool. Perhaps
with a --pg-basebackup-parameters or such.

Yeah. I'm planning to do that in a near future. There are a few questions in my

mind. Should we call the pg_basebackup directly (like

pglogical_create_subscriber does) or use a base backup machinery to obtain the

backup? If we choose the former, it should probably sanitize the

--pg-basebackup-parameters to allow only a subset of the command-line options

(?). AFAICS the latter requires some refactors in the pg_basebackup code --

e.g. expose at least one function (BaseBackup?) that accepts a struct of

command-line options as a parameter and returns success/failure. Another

possibility is to implement a simple BASE_BACKUP command via replication

protocol. The disadvantages are: (a) it could duplicate code and (b) it might

require maintenance if new options are added to the BASE_BACKUP command.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

24 February 2022, 07:05:17

On Mon, Feb 21, 2022 at 5:41 PM Euler Taveira <euler@eulerto.com> wrote:
>
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.
>
> The general idea is to create and convert a physical replica or a base backup
> (archived WAL files available) into a logical replica. The initial data copy
> and catchup tends to be faster on a physical replica. This technique has been
> successfully used in pglogical_create_subscriber [1].
>

Sounds like a promising idea.

> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.
>
> DESIGN
>
> The conversion requires 8 steps.
>
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
>

What is the need to create an extra slot other than the slot for each
database? Can't we use the largest LSN returned by slots as the
recovery-target-lsn and starting point for subscriptions?

How, these additional slots will get freed or reused when say the
server has crashed/stopped after creating the slots but before
creating the subscriptions? Users won't even know the names of such
slots as they are internally created.

> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.
>
> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.
>

The backup will take the backup of all the databases present on the
source server. Do we need to provide the way/recommendation to remove
the databases that are not required?

Can we see some numbers with various sizes of databases (cluster) to
see how it impacts the time for small to large size databases as
compared to the traditional method? This might help giving users
advice on when to use this tool?


-- 
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Andreas Karlsson

Date:

02 March 2022, 00:08:59

On 2/21/22 13:09, Euler Taveira wrote:
> DESIGN
> 
> The conversion requires 8 steps.
> 
> 1. Check if the target data directory has the same system identifier 
> than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source 
> server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the 
> source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't 
> enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.

Very interesting!

I actually just a couple of weeks ago proposed a similar design for 
upgrading a database of a customer of mine. We have not tried it yet so 
it is not decided if we should go ahead with it.

In our case the goal is a bit different so my idea is that we will use 
pg_dump/pg_restore (or pg_upgrade and then some manual cleanup if 
pg_dump/pg_restore is too slow) on the target server. The goal of this 
design is to get a nice clean logical replica at the new version of 
PostgreSQL with indexes with the correct collations, all old invalid 
constraints validated, minimal bloat, etc. And all of this without 
creating bloat or putting too much load on the old master during the 
process. We have plenty of disk space and plenty of time so those are 
not limitations in our case. I can go into more detail if there is interest.

It is nice to see that our approach is not entirely unique. :) And I 
will take a look at this patch when I find the time.

Andreas

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

15 March 2022, 13:51:11

On 21.02.22 13:09, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly 
> integrated
> with Postgres.

Are we comfortable with the name pg_subscriber?  It seems too general.
Are we planning other subscriber-related operations in the future?  If
so, we should at least make this one use a --create option or
something like that.

  doc/src/sgml/ref/pg_subscriber.sgml

Attached is a patch that reorganizes the man page a bit.  I moved the
description of the steps to the Notes section and formatted it
differently.  I think the steps are interesting but not essential for
the using of the program, so I wanted to get them out of the main
description.

  src/bin/pg_subscriber/pg_subscriber.c

+   if (made_temp_replslot)
+   {
+       conn = connect_database(dbinfo[0].pubconninfo, true);
+       drop_replication_slot(conn, &dbinfo[0], temp_replslot);
+       disconnect_database(conn);
+   }

Temp slots don't need to be cleaned up.

+/*
+ * Obtain the system identifier from control file. It will be used to 
compare
+ * if a data directory is a clone of another one. This routine is used 
locally
+ * and avoids a replication connection.
+ */
+static char *
+get_control_from_datadir(const char *datadir)

This could return uint64 directly, without string conversion.
get_sysid_from_conn() could then convert to uint64 internally.

+       {"verbose", no_argument, NULL, 'v'},

I'm not sure if the --verbose option is all that useful.

+       {"stop-subscriber", no_argument, NULL, 1},

This option doesn't seem to be otherwise supported or documented.

+   pub_sysid = pg_malloc(32);
+   pub_sysid = get_sysid_from_conn(dbinfo[0].pubconninfo);
+   sub_sysid = pg_malloc(32);
+   sub_sysid = get_control_from_datadir(subscriber_dir);

These mallocs don't appears to be of any use.

+   dbname_conninfo = pg_malloc(NAMEDATALEN);

This seems wrong.

Overall, this code could use a little bit more structuring.  There are
a lot of helper functions that don't seem to do a lot and are mostly
duplicate runs-this-SQL-command calls.  But the main() function is
still huge.  There is room for refinement.

  src/bin/pg_subscriber/t/001_basic.pl

Good start, but obviously, we'll need some real test cases here also.

  src/bin/initdb/initdb.c
  src/bin/pg_ctl/pg_ctl.c
  src/common/file_utils.c
  src/include/common/file_utils.h

I recommend skipping this refactoring.  The readfile() function from
pg_ctl is not general enough to warrant the pride of place of a
globally available function.  Note that it is specifically geared
toward some of pg_ctl's requirements, for example that the underlying
file can change while it is being read.

The requirements of pg_subscriber can be satisfied more easily: Just
call pg_ctl to start the server.  You are already using that in
pg_subscriber.  Is there a reason it can't be used here as well?

Attachment

0001-fixup-Create-a-new-logical-replica-from-a-base-backu.patch

Re: speed up a logical replica setup

From

Andrew Dunstan

Date:

18 March 2022, 22:34:25

On 3/15/22 09:51, Peter Eisentraut wrote:
> On 21.02.22 13:09, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly
>> integrated
>> with Postgres.
>
> Are we comfortable with the name pg_subscriber?  It seems too general.
> Are we planning other subscriber-related operations in the future?  If
> so, we should at least make this one use a --create option or
> something like that.

Not really sold on the name (and I didn't much like the name
pglogical_create_subscriber either, although it's a cool facility and
I'm happy to see us adopting something like it).

ISTM we should have a name that conveys that we are *converting* a
replica or equivalent to a subscriber.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: speed up a logical replica setup

From

Andres Freund

Date:

22 March 2022, 01:25:39

Hi,

On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.

Given that this has been submitted just before the last CF and is a patch of
nontrivial size, has't made significant progress ISTM it should be moved to
the next CF?

It currently fails in cfbot, but that's likely just due to Peter's incremental
patch. Perhaps you could make sure the patch still applies and repost?

Greetings,

Andres Freund

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

22 March 2022, 01:44:33

On Fri, 18 Mar 2022 at 19:34 Andrew Dunstan <andrew@dunslane.net> wrote:

On 3/15/22 09:51, Peter Eisentraut wrote:
> On 21.02.22 13:09, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly
>> integrated
>> with Postgres.
>
> Are we comfortable with the name pg_subscriber? It seems too general.
> Are we planning other subscriber-related operations in the future? If
> so, we should at least make this one use a --create option or
> something like that.

Not really sold on the name (and I didn't much like the name
pglogical_create_subscriber either, although it's a cool facility and
I'm happy to see us adopting something like it).

ISTM we should have a name that conveys that we are *converting* a
replica or equivalent to a subscriber.

Some time ago I did a POC on it [1] and I used the name pg_create_subscriber

[1]

https://github.com/fabriziomello/pg_create_subscriber

Fabrízio de Royes Mello

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

23 March 2022, 18:50:34

On 18.03.22 23:34, Andrew Dunstan wrote:
> On 3/15/22 09:51, Peter Eisentraut wrote:
>> On 21.02.22 13:09, Euler Taveira wrote:
>>> A new tool called pg_subscriber does this conversion and is tightly
>>> integrated
>>> with Postgres.
>>
>> Are we comfortable with the name pg_subscriber?  It seems too general.
>> Are we planning other subscriber-related operations in the future?  If
>> so, we should at least make this one use a --create option or
>> something like that.
> 
> 
> Not really sold on the name (and I didn't much like the name
> pglogical_create_subscriber either, although it's a cool facility and
> I'm happy to see us adopting something like it).
> 
> ISTM we should have a name that conveys that we are *converting* a
> replica or equivalent to a subscriber.

The pglogical tool includes the pg_basebackup run, so it actually 
"creates" the subscriber from scratch.  Whether this tool is also doing 
that is still being discussed.

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

29 March 2022, 12:34:09

On 22.03.22 02:25, Andres Freund wrote:
> On 2022-02-21 09:09:12 -0300, Euler Taveira wrote:
>> A new tool called pg_subscriber does this conversion and is tightly integrated
>> with Postgres.
> 
> Given that this has been submitted just before the last CF and is a patch of
> nontrivial size, has't made significant progress ISTM it should be moved to
> the next CF?

done

Re: speed up a logical replica setup

From

Jacob Champion

Date:

02 August 2022, 18:56:15

This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3556/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

23 October 2023, 03:53:21

On Mon, Feb 21, 2022, at 9:09 AM, Euler Taveira wrote:

A new tool called pg_subscriber does this conversion and is tightly integrated
with Postgres.

After a long period of inactivity, I'm back to this client tool. As suggested

by Andres, I added a new helper function to change the system identifier as the

last step. I also thought about including the pg_basebackup support but decided

to keep it simple (at least for this current version). The user can always

execute pg_basebackup as a preliminary step to create a standby replica and it

will work. (I will post a separate patch that includes the pg_basebackup

support on the top of this one.)

Amit asked if an extra replication slot is required. It is not. The reason I

keep it is to remember that at least the latest replication slot needs to be

created after the pg_basebackup finishes (pg_backup_stop() call). Regarding the

atexit() routine, it tries to do the best to remove all the objects it created,

however, there is no guarantee it can remove them because it depends on

external resources such as connectivity and authorization. I added a new

warning message if it cannot drop the transient replication slot. It is

probably a good idea to add such warning message into the cleanup routine too.

More to this point, another feature that checks and remove all left objects.

The transient replication slot is ok because it should always be removed at the

end. However, the big question is how to detect that you are not removing

objects (publications, subscriptions, replication slots) from a successful

conversion.

Amit also asked about setup a logical replica with m databases where m is less

than the total number of databases. One option is to remove the "extra"

databases in the target server after promoting the physical replica or in one

of the latest steps. Maybe it is time to propose partial physical replica that

contains only a subset of databases on primary. (I'm not volunteering to it.)

Hence, pg_basebackup has an option to remove these "extra" databases so this

tool can take advantage of it.

Let's continue with the bike shedding... I agree with Peter E that this name

does not express what this tool is. At the moment, it only have one action:

create. If I have to suggest other actions I would say that it could support

switchover option too (that removes the infrastructure created by this tool).

If we decide to keep this name, it should be a good idea to add an option to

indicate what action it is executing (similar to pg_recvlogical) as suggested

by Peter.

I included the documentation cleanups that Peter E shared. I also did small

adjustments into the documentation. It probably deserves a warning section that

advertises about the cleanup.

I refactored the transient replication slot code and decided to use a permanent

(instead of temporary) slot to avoid keeping a replication connection open for

a long time until the target server catches up.

The system identifier functions (get_control_from_datadir() and

get_sysid_from_conn()) now returns uint64 as suggested by Peter.

After reflection, the --verbose option should be renamed to --progress. There

are also some messages that should be converted to debug messages.

I fixed the useless malloc. I rearrange the code a bit but the main still has ~

370 lines (without options/validation ~ 255 lines. I'm trying to rearrange the

code to make the code easier to read and at the same time reduce the main size.

I already have a few candidates in mind such as the code that stops the standby

and the part that includes the recovery parameters. I removed the refactor I

proposed in the previous patch and the current code is relying on pg_ctl --wait

behavior. Are there issues with this choice? Well, one annoying situation is

that pg_ctl does not have a "wait forever" option. If one of the pg_ctl calls

fails, you could probably have to start again (unless you understand the

pg_subscriber internals and fix the setup by yourself). You have to choose an

arbitrary timeout value and expect that pg_ctl *does* perform the action less

than the timeout.

Real tests are included. The cleanup code does not have coverage because a

simple reproducible case isn't easy. I'm also not sure if it is worth it. We

can explain it in the warning section that was proposed.

It is still a WIP but I would like to share it and get some feedback.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v2-0001-Creates-a-new-logical-replica-from-a-standby-serv.patch

Re: speed up a logical replica setup

From

Ashutosh Bapat

Date:

26 October 2023, 11:47:23

On Mon, Oct 23, 2023 at 9:34 AM Euler Taveira <euler@eulerto.com> wrote:

>
> It is still a WIP but I would like to share it and get some feedback.
>
>

I have started reviewing the patch. I have just read through all the
code. It's well documented and clear. Next I will review the design in
detail. Here are a couple of minor comments
1.
+tests += {
+ 'name': 'pg_subscriber',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'tests': [
+ 't/001_basic.pl',

COMMENT
Shouldn't we include 002_standby.pl?

2. CreateReplicationSlotLSN, is not used anywhere. Instead I see
create_logical_replication_slot() in pg_subscriber.c. Which of these
two you intend to use finally?

--
Best Wishes,
Ashutosh Bapat

Re: speed up a logical replica setup

From

shihao zhong

Date:

01 November 2023, 02:46:41

I think this is duplicate with https://commitfest.postgresql.org/45/4637/

The new status of this patch is: Waiting on Author

Re: speed up a logical replica setup

From

Ashutosh Bapat

Date:

01 November 2023, 13:40:15

On Thu, Oct 26, 2023 at 5:17 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Oct 23, 2023 at 9:34 AM Euler Taveira <euler@eulerto.com> wrote:
>
> >
> > It is still a WIP but I would like to share it and get some feedback.
> >
> >
>
> I have started reviewing the patch. I have just read through all the
> code. It's well documented and clear. Next I will review the design in
> detail.

Here are some comments about functionality and design.

+ <step>
+ <para>
+ <application>pg_subscriber</application> creates one replication slot for
+ each specified database on the source server. The replication slot name
+ contains a <literal>pg_subscriber</literal> prefix. These replication
+ slots will be used by the subscriptions in a future step. Another
+ replication slot is used to get a consistent start location. This
+ consistent LSN will be used as a stopping point in the <xref
+ linkend="guc-recovery-target-lsn"/> parameter and by the
+ subscriptions as a replication starting point. It guarantees that no
+ transaction will be lost.
+ </para>
+ </step>

CREATE_REPLICATION_SLOT would wait for any incomplete transaction to
complete. So it may not be possible to have an incomplete transaction
on standby when it comes out of recovery. Am I correct? Can we please
have a testcase where we test this scenario? What about a prepared
transactions?

+
+ <step>
+ <para>
+ <application>pg_subscriber</application> writes recovery parameters into
+ the target data directory and start the target server. It specifies a LSN
+ (consistent LSN that was obtained in the previous step) of write-ahead
+ log location up to which recovery will proceed. It also specifies
+ <literal>promote</literal> as the action that the server should take once
+ the recovery target is reached. This step finishes once the server ends
+ standby mode and is accepting read-write operations.
+ </para>
+ </step>

At this stage the standby would have various replication objects like
publications, subscriptions, origins inherited from the upstream
server and possibly very much active. With failover slots, it might
inherit replication slots. Is it intended that the new subscriber also
acts as publisher for source's subscribers OR that the new subscriber
should subscribe to the upstreams of the source? Some use cases like
logical standby might require that but a multi-master multi-node setup
may not. The behaviour should be user configurable.

There may be other objects in this category which need special consideration on
the subscriber. I haven't fully thought through the list of such objects.

+ uses the replication slot that was created in a previous step. The
+ subscription is created but it is not enabled yet. The reason is the
+ replication progress must be set to the consistent LSN but replication
+ origin name contains the subscription oid in its name. Hence, the

Not able to understand the sentence "The reason is ... in its name".
Why is subscription OID in origin name matters?

+ <para>
+ <application>pg_subscriber</application> stops the target server to change
+ its system identifier.
+ </para>

I expected the subscriber to be started after this step.

Why do we need pg_resetwal?

+ appendPQExpBuffer(str, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
+ appendPQExpBufferStr(str, " LOGICAL \"pgoutput\" NOEXPORT_SNAPSHOT");

Hardcoding output plugin name would limit this utility only to
built-in plugin. Any reason for that limitation?

In its current form the utility creates a logical subscriber which
subscribes to all the tables (and sequences when we have sequence
replication). But it will be useful even in case of selective
replication from a very large database. In such a case the new
subscriber will need to a. remove the unwanted objects b. subscriber
will need to subscribe to publications publishing the "interesting"
objects. We don't need to support this case, but the current
functionality (including the interface) and design shouldn't limit us
from doing so. Have you thought about this case?

I noticed some differences between this and a similar utility
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c.
I will be reviewing these differences next to see if we are missing
anything here.

--
Best Wishes,
Ashutosh Bapat

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

01 November 2023, 14:04:37

On Tue, Oct 31, 2023, at 11:46 PM, shihao zhong wrote:

I think this is duplicate with https://commitfest.postgresql.org/45/4637/

The new status of this patch is: Waiting on Author

I withdrew the other entry.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Ashutosh Bapat

Date:

02 November 2023, 06:10:02

On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> I noticed some differences between this and a similar utility
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c.
> I will be reviewing these differences next to see if we are missing
> anything here.

Some more missing things to discuss

Handling signals - The utility cleans up left over objects on exit.
But default signal handlers will make the utility exit without a
proper cleanup [1]. The signal handlers may clean up the objects
themselves or at least report the objects that need tobe cleaned up.

Idempotent behaviour - Given that the utility will be used when very
large amount of data is involved, redoing everything after a network
glitch or a temporary failure should be avoided. This is true when the
users start with base backup. Again, I don't think we should try to be
idempotent in v1 but current design shouldn't stop us from doing so. I
didn't find anything like that in my review. But something to keep in
mind.

That finishes my first round of review. I will wait for your updated
patches before the next round.

[1] NOTEs section in man atexit().

--
Best Wishes,
Ashutosh Bapat

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

07 November 2023, 21:00:39

On 23.10.23 05:53, Euler Taveira wrote:
> Let's continue with the bike shedding... I agree with Peter E that this name
> does not express what this tool is. At the moment, it only have one action:
> create. If I have to suggest other actions I would say that it could support
> switchover option too (that removes the infrastructure created by this 
> tool).
> If we decide to keep this name, it should be a good idea to add an option to
> indicate what action it is executing (similar to pg_recvlogical) as 
> suggested
> by Peter.

Speaking of which, would it make sense to put this tool (whatever the 
name) into the pg_basebackup directory?  It's sort of related, and it 
also shares some code.

Re: speed up a logical replica setup

From

Michael Paquier

Date:

07 November 2023, 23:12:50

On Tue, Nov 07, 2023 at 10:00:39PM +0100, Peter Eisentraut wrote:
> Speaking of which, would it make sense to put this tool (whatever the name)
> into the pg_basebackup directory?  It's sort of related, and it also shares
> some code.

I've read the patch, and the additions to streamutil.h and
streamutil.c make it kind of natural to have it sit in pg_basebackup/.
There's pg_recvlogical already there.  I am wondering about two
things, though:
- Should the subdirectory pg_basebackup be renamed into something more
generic at this point?  All these things are frontend tools that deal
in some way with the replication protocol to do their work.  Say
a replication_tools?
- And if it would be better to refactor some of the code generic to
all these streaming tools to fe_utils.  What makes streamutil.h a bit
less pluggable are all its extern variables to control the connection,
but perhaps that can be an advantage, as well, in some cases.
--
Michael

Attachment

signature.asc

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

08 November 2023, 12:50:47

On Tue, Nov 7, 2023, at 8:12 PM, Michael Paquier wrote:

On Tue, Nov 07, 2023 at 10:00:39PM +0100, Peter Eisentraut wrote:
> Speaking of which, would it make sense to put this tool (whatever the name)
> into the pg_basebackup directory? It's sort of related, and it also shares
> some code.

I used the CreateReplicationSlot() from streamutil.h but decided to use the

CREATE_REPLICATION_SLOT command directly because it needs the LSN as output. As

you noticed at that time I wouldn't like a dependency in the pg_basebackup

header files; if we move this binary to base backup directory, it seems natural

to refactor the referred function and use it.

I've read the patch, and the additions to streamutil.h and
streamutil.c make it kind of natural to have it sit in pg_basebackup/.
There's pg_recvlogical already there. I am wondering about two
things, though:
- Should the subdirectory pg_basebackup be renamed into something more
generic at this point? All these things are frontend tools that deal
in some way with the replication protocol to do their work. Say
a replication_tools?

It is a good fit for this tool since it is another replication tool. I also

agree with the directory renaming; it seems confusing that the directory has

the same name as one binary but also contains other related binaries in it.

- And if it would be better to refactor some of the code generic to
all these streaming tools to fe_utils. What makes streamutil.h a bit
less pluggable are all its extern variables to control the connection,
but perhaps that can be an advantage, as well, in some cases.

I like it. There are common functions such as GetConnection(),

CreateReplicationSlot(), DropReplicationSlot() and RunIdentifySystem() that is

used by all of these replication tools. We can move the extern variables into

parameters to have a pluggable streamutil.h.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Michael Paquier

Date:

08 November 2023, 23:13:42

On Wed, Nov 08, 2023 at 09:50:47AM -0300, Euler Taveira wrote:
> On Tue, Nov 7, 2023, at 8:12 PM, Michael Paquier wrote:
> I used the CreateReplicationSlot() from streamutil.h but decided to use the
> CREATE_REPLICATION_SLOT command directly because it needs the LSN as output. As
> you noticed at that time I wouldn't like a dependency in the pg_basebackup
> header files; if we move this binary to base backup directory, it seems natural
> to refactor the referred function and use it.

Right.  That should be OK to store that in an optional XLogRecPtr
pointer, aka by letting the option to pass NULL as argument of the
function if the caller needs nothing.

>> I've read the patch, and the additions to streamutil.h and
>> streamutil.c make it kind of natural to have it sit in pg_basebackup/.
>> There's pg_recvlogical already there.  I am wondering about two
>> things, though:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point?  All these things are frontend tools that deal
>> in some way with the replication protocol to do their work.  Say
>> a replication_tools?
>
> It is a good fit for this tool since it is another replication tool. I also
> agree with the directory renaming; it seems confusing that the directory has
> the same name as one binary but also contains other related binaries in it.

Or cluster_tools?  Or stream_tools?  replication_tools may be OK, but
I have a bad sense in naming new things around here.  So if anybody
has a better idea, feel free..

>> - And if it would be better to refactor some of the code generic to
>> all these streaming tools to fe_utils.  What makes streamutil.h a bit
>> less pluggable are all its extern variables to control the connection,
>> but perhaps that can be an advantage, as well, in some cases.
>
> I like it. There are common functions such as GetConnection(),
> CreateReplicationSlot(), DropReplicationSlot() and RunIdentifySystem() that is
> used by all of these replication tools. We can move the extern variables into
> parameters to have a pluggable streamutil.h.

And perhaps RetrieveWalSegSize() as well as GetSlotInformation().
These kick replication commands.
--
Michael

Attachment

signature.asc

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

09 November 2023, 14:41:53

On 08.11.23 00:12, Michael Paquier wrote:
> - Should the subdirectory pg_basebackup be renamed into something more
> generic at this point?  All these things are frontend tools that deal
> in some way with the replication protocol to do their work.  Say
> a replication_tools?

Seems like unnecessary churn.  Nobody has complained about any of the 
other tools in there.

> - And if it would be better to refactor some of the code generic to
> all these streaming tools to fe_utils.  What makes streamutil.h a bit
> less pluggable are all its extern variables to control the connection,
> but perhaps that can be an advantage, as well, in some cases.

Does anyone outside of pg_basebackup + existing friends + new friend 
need that?  Seems like extra complications.

Re: speed up a logical replica setup

From

Michael Paquier

Date:

09 November 2023, 23:12:07

On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> On 08.11.23 00:12, Michael Paquier wrote:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point?  All these things are frontend tools that deal
>> in some way with the replication protocol to do their work.  Say
>> a replication_tools?
>
> Seems like unnecessary churn.  Nobody has complained about any of the other
> tools in there.

Not sure.  We rename things across releases in the tree from time to
time, and here that's straight-forward.

>> - And if it would be better to refactor some of the code generic to
>> all these streaming tools to fe_utils.  What makes streamutil.h a bit
>> less pluggable are all its extern variables to control the connection,
>> but perhaps that can be an advantage, as well, in some cases.
>
> Does anyone outside of pg_basebackup + existing friends + new friend need
> that?  Seems like extra complications.

Actually, yes, I've used these utility routines in some past work, and
having the wrapper routines able to run the replication commands in
fe_utils would have been nicer than having to link to a source tree.
--
Michael

Attachment

signature.asc

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

06 December 2023, 07:21:44

On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:

On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> On 08.11.23 00:12, Michael Paquier wrote:
>> - Should the subdirectory pg_basebackup be renamed into something more
>> generic at this point? All these things are frontend tools that deal
>> in some way with the replication protocol to do their work. Say
>> a replication_tools?
>
> Seems like unnecessary churn. Nobody has complained about any of the other
> tools in there.

Not sure. We rename things across releases in the tree from time to
time, and here that's straight-forward.

Based on this discussion it seems we have a consensus that this tool should be

in the pg_basebackup directory. (If/when we agree with the directory renaming,

it could be done in a separate patch.) Besides this move, the v3 provides a dry

run mode. It basically executes every routine but skip when should do

modifications. It is an useful option to check if you will be able to run it

without having issues with connectivity, permission, and existing objects

(replication slots, publications, subscriptions). Tests were slightly improved.

Messages were changed to *not* provide INFO messages by default and --verbose

provides INFO messages and --verbose --verbose also provides DEBUG messages. I

also refactored the connect_database() function into which the connection will

always use the logical replication mode. A bug was fixed in the transient

replication slot name. Ashutosh review [1] was included. The code was also indented.

There are a few suggestions from Ashutosh [2] that I will reply in another

email.

I'm still planning to work on the following points:

1. improve the cleanup routine to point out leftover objects if there is any

connection issue.

2. remove the physical replication slot if the standby is using one

(primary_slot_name).

3. provide instructions to promote the logical replica into primary, I mean,

stop the replication between the nodes and remove the replication setup

(publications, subscriptions, replication slots). Or even include another

action to do it. We could add both too.

Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice

UI for users that would like to use it.

[1] https://postgr.es/m/CAExHW5sCAU3NvPKd7msScQKvrBN-x_AdDQD-ZYAwOxuWG%3Doz1w%40mail.gmail.com

[2] https://postgr.es/m/CAExHW5vHFemFvTUHe+7XWphVZJxrEXz5H3dD4UQi7CwmdMJQYg@mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v3-0001-Creates-a-new-logical-replica-from-a-standby-serv.patch

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

20 December 2023, 12:22:26

Hi,

On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>
>
> [1] https://postgr.es/m/CAExHW5sCAU3NvPKd7msScQKvrBN-x_AdDQD-ZYAwOxuWG%3Doz1w%40mail.gmail.com
> [2] https://postgr.es/m/CAExHW5vHFemFvTUHe+7XWphVZJxrEXz5H3dD4UQi7CwmdMJQYg@mail.gmail.com
>

The changes in the file 'src/tools/msvc/Mkvcbuild.pm' seems
unnecessary as the folder 'msvc' is removed due to the commit [1].


To review the changes, I did 'git reset --hard' to the commit previous
to commit [1].
I tried to build the postgres on my Windows machine using two methods:
i. building using Visual Studio
ii. building using Meson

When I built the code using Visual Studio, on installing postgres,
pg_subscriber binary was not created.
But when I built the code using Meson, on installing postgres,
pg_subscriber binary was created.
Is this behaviour intentional?

[1] https://github.com/postgres/postgres/commit/1301c80b2167feb658a738fa4ceb1c23d0991e23

Thanks and Regards,
Shlok Kyal

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

20 December 2023, 12:55:11

On Wed, Dec 20, 2023, at 9:22 AM, Shlok Kyal wrote:

When I built the code using Visual Studio, on installing postgres,
pg_subscriber binary was not created.
But when I built the code using Meson, on installing postgres,
pg_subscriber binary was created.
Is this behaviour intentional?

No. I will update the patch accordingly. I suspect that a fair amount of patches

broke due to MSVC change.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

21 December 2023, 06:16:35

On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
>

I think this is an important part. Shall we try to write to some file
the pending objects to be cleaned up? We do something like that during
the upgrade.

> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>

Isn't point 2 also essential because how would otherwise such a slot
be advanced or removed?

A few other points:
==============
1. Previously, I asked whether we need an additional replication slot
patch created to get consistent LSN and I see the following comment in
the patch:

+ *
+ * XXX we should probably use the last created replication slot to get a
+ * consistent LSN but it should be changed after adding pg_basebackup
+ * support.

Yeah, sure, we may want to do that after backup support and we can
keep a comment for the same but I feel as the patch stands today,
there is no good reason to keep it. Also, is there a reason that we
can't create the slots after backup is complete and before we write
recovery parameters

2.
+ appendPQExpBuffer(str,
+   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
+   "WITH (create_slot = false, copy_data = false, enabled = false)",
+   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);

Shouldn't we enable two_phase by default for newly created
subscriptions? Is there a reason for not doing so?

3. How about sync slots on the physical standby if present? Do we want
to retain those as it is or do we need to remove those? We are
actively working on the patch [1] for the same.

4. Can we see some numbers with various sizes of databases (cluster)
to see how it impacts the time for small to large-size databases as
compared to the traditional method? This might help us with giving
users advice on when to use this tool. We can do this bit later as
well when the patch is closer to being ready for commit.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716DAF72265388A2AD424119495A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

21 December 2023, 10:15:34

On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Here are some comments about functionality and design.
>
> + <step>
> + <para>
> + <application>pg_subscriber</application> creates one replication slot for
> + each specified database on the source server. The replication slot name
> + contains a <literal>pg_subscriber</literal> prefix. These replication
> + slots will be used by the subscriptions in a future step. Another
> + replication slot is used to get a consistent start location. This
> + consistent LSN will be used as a stopping point in the <xref
> + linkend="guc-recovery-target-lsn"/> parameter and by the
> + subscriptions as a replication starting point. It guarantees that no
> + transaction will be lost.
> + </para>
> + </step>
>
> CREATE_REPLICATION_SLOT would wait for any incomplete transaction to
> complete. So it may not be possible to have an incomplete transaction
> on standby when it comes out of recovery. Am I correct? Can we please
> have a testcase where we test this scenario? What about a prepared
> transactions?
>

It will wait even for prepared transactions to commit. So, there
shouldn't be any behavior difference for prepared and non-prepared
transactions.

> +
> + <step>
> + <para>
> + <application>pg_subscriber</application> writes recovery parameters into
> + the target data directory and start the target server. It specifies a LSN
> + (consistent LSN that was obtained in the previous step) of write-ahead
> + log location up to which recovery will proceed. It also specifies
> + <literal>promote</literal> as the action that the server should take once
> + the recovery target is reached. This step finishes once the server ends
> + standby mode and is accepting read-write operations.
> + </para>
> + </step>
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.
>

Good points but even if we make it user configurable how to exclude
such replication objects? And if we don't exclude then what will be
their use because if one wants to use it as a logical standby then we
only need publications and failover/sync slots in it and also there
won't be a need to create new slots, publications on the primary to
make the current physical standby as logical subscriber.

> There may be other objects in this category which need special consideration on
> the subscriber. I haven't fully thought through the list of such objects.
>
> + uses the replication slot that was created in a previous step. The
> + subscription is created but it is not enabled yet. The reason is the
> + replication progress must be set to the consistent LSN but replication
> + origin name contains the subscription oid in its name. Hence, the
>
> Not able to understand the sentence "The reason is ... in its name".
> Why is subscription OID in origin name matters?
>

Using subscription OID in origin is probably to uniquely identify the
origin corresponding to the subscription, we do that while creating a
subscription as well.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

vignesh C

Date:

01 January 2024, 10:14:21

On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.

1) This Assert can fail if source is shutdown:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);

I could simulate it by shutting the primary while trying to reach the
consistent state:
pg_subscriber: postmaster reached the consistent state
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: pg_subscriber.c:692: drop_replication_slot: Assertion
`conn != ((void *)0)' failed.
Aborted

2) Should we have some checks to see if the max replication slot
configuration is ok based on the number of slots that will be created,
we have similar checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

3) Should we check if wal_level is set to logical, we have similar
checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

4) The physical replication slot that was created will still be
present in the primary node, I felt this should be removed.

5) I felt the target server should be started before completion of
pg_subscriber:
+       /*
+        * Stop the subscriber.
+        */
+       pg_log_info("stopping the subscriber");
+
+       pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
subscriber_dir);
+       rc = system(pg_ctl_cmd);
+       pg_ctl_status(pg_ctl_cmd, rc, 0);
+
+       /*
+        * Change system identifier.
+        */
+       modify_sysid(pg_resetwal_path, subscriber_dir);
+
+       success = true;
+
+       pg_log_info("Done!");
+
+       return 0;

Regards,
Vignesh

Re: speed up a logical replica setup

From

vignesh C

Date:

03 January 2024, 06:39:40

On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.

How about we do like this:
a) Starting the server in binary upgrade mode(so that the existing
subscriptions will not try to connect to the publishers) b) Disable
the subscriptions c) Drop the replication slots d) Drop the
publications e) Then restart the server in normal(non-upgrade) mode.
f) The rest of pg_subscriber work like
create_all_logical_replication_slots, create_subscription,
set_replication_progress, enable_subscription, etc
This will be done by default. There will be an option
--clean-logical-replication-info provided to allow DBA not to clean
the objects if DBA does not want to remove these objects.
I felt cleaning the replication information is better as a) Node-1
will replicate all the data to Node-2 (that Node-1 is subscribing to
from other nodes) after pg_subscriber setup is done. b) all the data
that Node-1 is publishing need not be published again by Node-2. There
is an option to override if the user does not want to remove the
logical replication objects.

Regards,
Vignesh

Re: speed up a logical replica setup

From

Amit Kapila

Date:

03 January 2024, 09:19:01

On Wed, Jan 3, 2024 at 12:09 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > At this stage the standby would have various replication objects like
> > publications, subscriptions, origins inherited from the upstream
> > server and possibly very much active. With failover slots, it might
> > inherit replication slots. Is it intended that the new subscriber also
> > acts as publisher for source's subscribers OR that the new subscriber
> > should subscribe to the upstreams of the source? Some use cases like
> > logical standby might require that but a multi-master multi-node setup
> > may not. The behaviour should be user configurable.
>
> How about we do like this:
> a) Starting the server in binary upgrade mode(so that the existing
> subscriptions will not try to connect to the publishers)
>

Can't we simply do it by starting the server with
max_logical_replication_workers = 0 or is there some other need to
start in binary upgrade mode?

 b) Disable
> the subscriptions
>

Why not simply drop the subscriptions?

 c) Drop the replication slots d) Drop the
> publications
>

I am not so sure about dropping publications because, unlike
subscriptions which can start to pull the data, there is no harm with
publications. Similar to publications there could be some user-defined
functions or other other objects which may not be required once the
standby replica is converted to subscriber. I guess we need to leave
those to the user.

 e) Then restart the server in normal(non-upgrade) mode.
> f) The rest of pg_subscriber work like
> create_all_logical_replication_slots, create_subscription,
> set_replication_progress, enable_subscription, etc
> This will be done by default. There will be an option
> --clean-logical-replication-info provided to allow DBA not to clean
> the objects if DBA does not want to remove these objects.
>

I agree that some kind of switch to control this action would be useful.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

vignesh C

Date:

03 January 2024, 09:47:42

On Wed, 3 Jan 2024 at 14:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 12:09 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > At this stage the standby would have various replication objects like
> > > publications, subscriptions, origins inherited from the upstream
> > > server and possibly very much active. With failover slots, it might
> > > inherit replication slots. Is it intended that the new subscriber also
> > > acts as publisher for source's subscribers OR that the new subscriber
> > > should subscribe to the upstreams of the source? Some use cases like
> > > logical standby might require that but a multi-master multi-node setup
> > > may not. The behaviour should be user configurable.
> >
> > How about we do like this:
> > a) Starting the server in binary upgrade mode(so that the existing
> > subscriptions will not try to connect to the publishers)
> >
>
> Can't we simply do it by starting the server with
> max_logical_replication_workers = 0 or is there some other need to
> start in binary upgrade mode?

I agree, max_logical_replication_workers = 0 is enough for our case.

>  b) Disable
> > the subscriptions
> >
>
> Why not simply drop the subscriptions?

Dropping subscriptions is ok as these subscriptions will not be required.

>  c) Drop the replication slots d) Drop the
> > publications
> >
>
> I am not so sure about dropping publications because, unlike
> subscriptions which can start to pull the data, there is no harm with
> publications. Similar to publications there could be some user-defined
> functions or other other objects which may not be required once the
> standby replica is converted to subscriber. I guess we need to leave
> those to the user.

Yes, that makes sense.

Regards,
Vignesh

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

04 January 2024, 02:54:20

On Thu, Dec 21, 2023, at 3:16 AM, Amit Kapila wrote:

I think this is an important part. Shall we try to write to some file
the pending objects to be cleaned up? We do something like that during
the upgrade.

That's a good idea.

> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>

Isn't point 2 also essential because how would otherwise such a slot
be advanced or removed?

I'm worried about a scenario that you will still use the primary. (Let's say

the logical replica will be promoted to a staging or dev server.) No connection

between primary and this new server so the primary slot is useless after the

promotion.

A few other points:
==============
1. Previously, I asked whether we need an additional replication slot
patch created to get consistent LSN and I see the following comment in
the patch:

+ *
+ * XXX we should probably use the last created replication slot to get a
+ * consistent LSN but it should be changed after adding pg_basebackup
+ * support.

Yeah, sure, we may want to do that after backup support and we can
keep a comment for the same but I feel as the patch stands today,
there is no good reason to keep it.

I'll remove the comment to avoid confusing.

Also, is there a reason that we
can't create the slots after backup is complete and before we write
recovery parameters

No.

2.
+ appendPQExpBuffer(str,
+   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
+   "WITH (create_slot = false, copy_data = false, enabled = false)",
+   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);

Shouldn't we enable two_phase by default for newly created
subscriptions? Is there a reason for not doing so?

Why? I decided to keep the default for some settings (streaming,

synchronous_commit, two_phase, disable_on_error). Unless there is a compelling

reason to enable it, I think we should use the default. Either way, data will

arrive on subscriber as soon as the prepared transaction is committed.

3. How about sync slots on the physical standby if present? Do we want
to retain those as it is or do we need to remove those? We are
actively working on the patch [1] for the same.

I didn't read the current version of the referred patch but if the proposal is

to synchronize logical replication slots iif you are using a physical

replication, as soon as pg_subscriber finishes the execution, there won't be

synchronization on these logical replication slots because there isn't a

physical replication anymore. If the goal is a promotion, the current behavior

is correct because the logical replica will retain WAL since it was converted.

However, if you are creating a logical replica, this WAL retention is not good

and the customer should eventually remove these logical replication slots on

the logical replica.

4. Can we see some numbers with various sizes of databases (cluster)
to see how it impacts the time for small to large-size databases as
compared to the traditional method? This might help us with giving
users advice on when to use this tool. We can do this bit later as
well when the patch is closer to being ready for commit.

I'll share it.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

04 January 2024, 03:21:49

On Mon, Jan 1, 2024, at 7:14 AM, vignesh C wrote:

1) This Assert can fail if source is shutdown:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+       PQExpBuffer str = createPQExpBuffer();
+       PGresult   *res;
+
+       Assert(conn != NULL);

Oops. I'll remove it.

2) Should we have some checks to see if the max replication slot
configuration is ok based on the number of slots that will be created,
we have similar checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

That's a good idea.

3) Should we check if wal_level is set to logical, we have similar
checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

That's a good idea.

4) The physical replication slot that was created will still be
present in the primary node, I felt this should be removed.

My proposal is to remove it [1]. It'll be include in the next version.

5) I felt the target server should be started before completion of
pg_subscriber:

Why? The initial version had an option to stop the subscriber. I decided to

remove the option and stop the subscriber by default mainly because (1) it is

an extra step to start the server (another point is that the WAL retention

doesn't happen due to additional (synchronized?) replication slots on

subscriber -- point 2). It was a conservative choice. If point 2 isn't an

issue, imo point 1 is no big deal.

[1] https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

04 January 2024, 05:41:20

On Thu, Jan 4, 2024 at 8:24 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Dec 21, 2023, at 3:16 AM, Amit Kapila wrote:
>
> > 2. remove the physical replication slot if the standby is using one
> >    (primary_slot_name).
> > 3. provide instructions to promote the logical replica into primary, I mean,
> >    stop the replication between the nodes and remove the replication setup
> >    (publications, subscriptions, replication slots). Or even include another
> >    action to do it. We could add both too.
> >
> > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > UI for users that would like to use it.
> >
>
> Isn't point 2 also essential because how would otherwise such a slot
> be advanced or removed?
>
>
> I'm worried about a scenario that you will still use the primary. (Let's say
> the logical replica will be promoted to a staging or dev server.) No connection
> between primary and this new server so the primary slot is useless after the
> promotion.
>

So, you also seem to be saying that it is not required once
pg_subscriber has promoted it. So, why it should be optional to remove
physical_replication_slot? I think we must remove it from the primary
unless there is some other reason.

> A few other points:
> ==============
> 1. Previously, I asked whether we need an additional replication slot
> patch created to get consistent LSN and I see the following comment in
> the patch:
>
> + *
> + * XXX we should probably use the last created replication slot to get a
> + * consistent LSN but it should be changed after adding pg_basebackup
> + * support.
>
> Yeah, sure, we may want to do that after backup support and we can
> keep a comment for the same but I feel as the patch stands today,
> there is no good reason to keep it.
>
>
> I'll remove the comment to avoid confusing.
>

My point is to not have an additional slot and keep a comment
indicating that we need an extra slot once we add pg_basebackup
support.

>
> 2.
> + appendPQExpBuffer(str,
> +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
>
> Shouldn't we enable two_phase by default for newly created
> subscriptions? Is there a reason for not doing so?
>
>
> Why? I decided to keep the default for some settings (streaming,
> synchronous_commit, two_phase, disable_on_error). Unless there is a compelling
> reason to enable it, I think we should use the default. Either way, data will
> arrive on subscriber as soon as the prepared transaction is committed.
>

I thought we could provide a better experience for logical replicas
created by default but I see your point and probably keeping default
values for parameters you mentioned seems reasonable to me.

> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
>
> I didn't read the current version of the referred patch but if the proposal is
> to synchronize logical replication slots iif you are using a physical
> replication, as soon as pg_subscriber finishes the execution, there won't be
> synchronization on these logical replication slots because there isn't a
> physical replication anymore. If the goal is a promotion, the current behavior
> is correct because the logical replica will retain WAL since it was converted.
>

I don't understand what you mean by promotion in this context. If
users want to simply promote the standby then there is no need to do
additional things that this tool is doing.

> However, if you are creating a logical replica, this WAL retention is not good
> and the customer should eventually remove these logical replication slots on
> the logical replica.
>

I think asking users to manually remove such slots won't be a good
idea. We might want to either remove them by default or provide an
option to the user.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

04 January 2024, 06:05:51

On Thu, Jan 4, 2024 at 8:52 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Jan 1, 2024, at 7:14 AM, vignesh C wrote:
>
>
> 5) I felt the target server should be started before completion of
> pg_subscriber:
>
>
> Why?
>

Won't it be a better user experience that after setting up the target
server as a logical replica (subscriber), it started to work
seamlessly without user intervention?

> The initial version had an option to stop the subscriber. I decided to
> remove the option and stop the subscriber by default mainly because (1) it is
> an extra step to start the server (another point is that the WAL retention
> doesn't happen due to additional (synchronized?) replication slots on
> subscriber -- point 2). It was a conservative choice. If point 2 isn't an
> issue, imo point 1 is no big deal.
>

By point 2, do you mean to have a check for "max replication slots"?
It so, the one possibility is to even increase that config, if the
required max_replication_slots is low.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

04 January 2024, 06:51:54

Hi,
I was testing the patch with following test cases:

Test 1 :
- Create a 'primary' node
- Setup physical replica using pg_basebackup  "./pg_basebackup –h
localhost –X stream –v –R –W –D ../standby "
- Insert data before and after pg_basebackup
- Run pg_subscriber and then insert some data to check logical
replication "./pg_subscriber –D ../standby -S “host=localhost
port=9000 dbname=postgres” -P “host=localhost port=9000
dbname=postgres” -d postgres"
- Also check pg_publication, pg_subscriber and pg_replication_slots tables.

Observation:
Data is not lost. Replication is happening correctly. Pg_subscriber is
working as expected.

Test 2:
- Create a 'primary' node
- Use normal pg_basebackup but don’t set up Physical replication
"./pg_basebackup –h localhost –v –W –D ../standby"
- Insert data before and after pg_basebackup
- Run pg_subscriber

Observation:
Pg_subscriber command is not completing and is stuck with following
log repeating:
LOG: waiting for WAL to become available at 0/3000168
LOG: invalid record length at 0/3000150: expected at least 24, got 0

Test 3:
- Create a 'primary' node
- Use normal pg_basebackup but don’t set up Physical replication
"./pg_basebackup –h localhost –v –W –D ../standby"
-Insert data before pg_basebackup but not after pg_basebackup
-Run pg_subscriber

Observation:
Pg_subscriber command is not completing and is stuck with following
log repeating:
LOG: waiting for WAL to become available at 0/3000168
LOG: invalid record length at 0/3000150: expected at least 24, got 0

I was not clear about how to use pg_basebackup in this case, can you
let me know if any changes need to be made for test2 and test3.

Thanks and regards
Shlok Kyal

Re: speed up a logical replica setup

From

Ashutosh Bapat

Date:

04 January 2024, 07:00:05

On Wed, Jan 3, 2024 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>  c) Drop the replication slots d) Drop the
> > publications
> >
>
> I am not so sure about dropping publications because, unlike
> subscriptions which can start to pull the data, there is no harm with
> publications. Similar to publications there could be some user-defined
> functions or other other objects which may not be required once the
> standby replica is converted to subscriber. I guess we need to leave
> those to the user.
>

IIUC, primary use of pg_subscriber utility is to start a logical
subscription from a physical base backup (to reduce initial sync time)
as against logical backup taken while creating a subscription. Hence I
am expecting that apart from this difference, the resultant logical
replica should look similar to the logical replica setup using a
logical subscription sync. Hence we should not leave any replication
objects around. UDFs (views, and other objects) may have some use on a
logical replica. We may replicate changes to UDF once DDL replication
is supported. But what good is having the same publications as primary
also on logical replica?

--
Best Wishes,
Ashutosh Bapat

Re: speed up a logical replica setup

From

Amit Kapila

Date:

04 January 2024, 11:04:13

On Thu, Jan 4, 2024 at 12:30 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >  c) Drop the replication slots d) Drop the
> > > publications
> > >
> >
> > I am not so sure about dropping publications because, unlike
> > subscriptions which can start to pull the data, there is no harm with
> > publications. Similar to publications there could be some user-defined
> > functions or other other objects which may not be required once the
> > standby replica is converted to subscriber. I guess we need to leave
> > those to the user.
> >
>
> IIUC, primary use of pg_subscriber utility is to start a logical
> subscription from a physical base backup (to reduce initial sync time)
> as against logical backup taken while creating a subscription. Hence I
> am expecting that apart from this difference, the resultant logical
> replica should look similar to the logical replica setup using a
> logical subscription sync. Hence we should not leave any replication
> objects around. UDFs (views, and other objects) may have some use on a
> logical replica. We may replicate changes to UDF once DDL replication
> is supported. But what good is having the same publications as primary
> also on logical replica?
>

The one use case that comes to my mind is to set up bi-directional
replication. The publishers want to subscribe to the new subscriber.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

04 January 2024, 11:16:28

On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Hi,
> I was testing the patch with following test cases:
>
> Test 1 :
> - Create a 'primary' node
> - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> localhost –X stream –v –R –W –D ../standby "
> - Insert data before and after pg_basebackup
> - Run pg_subscriber and then insert some data to check logical
> replication "./pg_subscriber –D ../standby -S “host=localhost
> port=9000 dbname=postgres” -P “host=localhost port=9000
> dbname=postgres” -d postgres"
> - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
>
> Observation:
> Data is not lost. Replication is happening correctly. Pg_subscriber is
> working as expected.
>
> Test 2:
> - Create a 'primary' node
> - Use normal pg_basebackup but don’t set up Physical replication
> "./pg_basebackup –h localhost –v –W –D ../standby"
> - Insert data before and after pg_basebackup
> - Run pg_subscriber
>
> Observation:
> Pg_subscriber command is not completing and is stuck with following
> log repeating:
> LOG: waiting for WAL to become available at 0/3000168
> LOG: invalid record length at 0/3000150: expected at least 24, got 0
>

I think probably the required WAL is not copied. Can you use the -X
option to stream WAL as well and then test? But I feel in this case
also, we should wait for some threshold time and then exit with
failure, removing new objects created, if any.

> Test 3:
> - Create a 'primary' node
> - Use normal pg_basebackup but don’t set up Physical replication
> "./pg_basebackup –h localhost –v –W –D ../standby"
> -Insert data before pg_basebackup but not after pg_basebackup
> -Run pg_subscriber
>
> Observation:
> Pg_subscriber command is not completing and is stuck with following
> log repeating:
> LOG: waiting for WAL to become available at 0/3000168
> LOG: invalid record length at 0/3000150: expected at least 24, got 0
>

This is similar to the previous test and you can try the same option
here as well.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Ashutosh Bapat

Date:

04 January 2024, 12:21:23

On Thu, Jan 4, 2024 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > But what good is having the same publications as primary
> > also on logical replica?
> >
>
> The one use case that comes to my mind is to set up bi-directional
> replication. The publishers want to subscribe to the new subscriber.

Hmm. Looks like another user controlled cleanup.

--
Best Wishes,
Ashutosh Bapat

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

04 January 2024, 15:48:03

On Thu, Jan 4, 2024, at 2:41 AM, Amit Kapila wrote:

So, you also seem to be saying that it is not required once
pg_subscriber has promoted it. So, why it should be optional to remove
physical_replication_slot? I think we must remove it from the primary
unless there is some other reason.

My point is to *always* remove the primary_slot_name on primary.

My point is to not have an additional slot and keep a comment
indicating that we need an extra slot once we add pg_basebackup
support.

Got it.

> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
>
> I didn't read the current version of the referred patch but if the proposal is
> to synchronize logical replication slots iif you are using a physical
> replication, as soon as pg_subscriber finishes the execution, there won't be
> synchronization on these logical replication slots because there isn't a
> physical replication anymore. If the goal is a promotion, the current behavior
> is correct because the logical replica will retain WAL since it was converted.
>

I don't understand what you mean by promotion in this context. If
users want to simply promote the standby then there is no need to do
additional things that this tool is doing.

ENOCOFFEE. s/promotion/switchover/

> However, if you are creating a logical replica, this WAL retention is not good
> and the customer should eventually remove these logical replication slots on
> the logical replica.
>

I think asking users to manually remove such slots won't be a good
idea. We might want to either remove them by default or provide an
option to the user.

Am I correct that the majority of the use cases these replication slots will be

useless? If so, let's remove them by default and add an option to control this

behavior (replication slot removal).

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

04 January 2024, 15:57:15

On Thu, Jan 4, 2024, at 3:05 AM, Amit Kapila wrote:

Won't it be a better user experience that after setting up the target
server as a logical replica (subscriber), it started to work
seamlessly without user intervention?

If we have an option to control the replication slot removal (default is on),

it seems a good UI. Even if the user decides to disable the replication slot

removal, it should print a message saying that these replication slots can

cause WAL retention.

> The initial version had an option to stop the subscriber. I decided to
> remove the option and stop the subscriber by default mainly because (1) it is
> an extra step to start the server (another point is that the WAL retention
> doesn't happen due to additional (synchronized?) replication slots on
> subscriber -- point 2). It was a conservative choice. If point 2 isn't an
> issue, imo point 1 is no big deal.
>

By point 2, do you mean to have a check for "max replication slots"?
It so, the one possibility is to even increase that config, if the
required max_replication_slots is low.

By point 2, I mean WAL retention (sentence inside parenthesis).

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

05 January 2024, 04:53:42

On Thu, Jan 4, 2024 at 9:18 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 4, 2024, at 2:41 AM, Amit Kapila wrote:
>
>
> I think asking users to manually remove such slots won't be a good
> idea. We might want to either remove them by default or provide an
> option to the user.
>
>
> Am I correct that the majority of the use cases these replication slots will be
> useless?
>

I am not so sure about it. Say, if some sync slots are present this
means the user wants this replica to be used later as a publisher.
Now, if the existing primary/publisher node is still alive then we
don't have these slots but if the user wants to switch over to this
new node as well then they may be required.

Is there a possibility that a cascading standby also has a slot on the
current physical replica being converted to a new subscriber?

> If so, let's remove them by default and add an option to control this
> behavior (replication slot removal).
>

The presence of slots on the physical replica indicates that the other
nodes/clusters could be dependent on it, so, I feel by default we
should give an error and if the user uses some option to remove slots
then it is fine to remove them.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

05 January 2024, 05:04:16

On Thu, Jan 4, 2024 at 9:27 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 4, 2024, at 3:05 AM, Amit Kapila wrote:
>
> Won't it be a better user experience that after setting up the target
> server as a logical replica (subscriber), it started to work
> seamlessly without user intervention?
>
>
> If we have an option to control the replication slot removal (default is on),
> it seems a good UI. Even if the user decides to disable the replication slot
> removal, it should print a message saying that these replication slots can
> cause WAL retention.
>

As pointed out in the previous response, I think we should not proceed
with such a risk of WAL retention and other nodes dependency, we
should either give an ERROR (default) or remove slots, if the user
provides an option. If we do so, do you think by default we can keep
the server started or let the user start it later? I think one
advantage of letting the user start it later is that she gets a chance
to adjust config parameters in postgresql.conf and by default we won't
be using system resources.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

05 January 2024, 06:49:14

On Thu, 4 Jan 2024 at 16:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > Hi,
> > I was testing the patch with following test cases:
> >
> > Test 1 :
> > - Create a 'primary' node
> > - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> > localhost –X stream –v –R –W –D ../standby "
> > - Insert data before and after pg_basebackup
> > - Run pg_subscriber and then insert some data to check logical
> > replication "./pg_subscriber –D ../standby -S “host=localhost
> > port=9000 dbname=postgres” -P “host=localhost port=9000
> > dbname=postgres” -d postgres"
> > - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
> >
> > Observation:
> > Data is not lost. Replication is happening correctly. Pg_subscriber is
> > working as expected.
> >
> > Test 2:
> > - Create a 'primary' node
> > - Use normal pg_basebackup but don’t set up Physical replication
> > "./pg_basebackup –h localhost –v –W –D ../standby"
> > - Insert data before and after pg_basebackup
> > - Run pg_subscriber
> >
> > Observation:
> > Pg_subscriber command is not completing and is stuck with following
> > log repeating:
> > LOG: waiting for WAL to become available at 0/3000168
> > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> >
>
> I think probably the required WAL is not copied. Can you use the -X
> option to stream WAL as well and then test? But I feel in this case
> also, we should wait for some threshold time and then exit with
> failure, removing new objects created, if any.

I have tested with -X stream option in pg_basebackup as well. In this
case also the pg_subscriber command is getting stuck.
logs:
2024-01-05 11:49:34.436 IST [61948] LOG:  invalid resource manager ID
102 at 0/3000118
2024-01-05 11:49:34.436 IST [61948] LOG:  waiting for WAL to become
available at 0/3000130

>
> > Test 3:
> > - Create a 'primary' node
> > - Use normal pg_basebackup but don’t set up Physical replication
> > "./pg_basebackup –h localhost –v –W –D ../standby"
> > -Insert data before pg_basebackup but not after pg_basebackup
> > -Run pg_subscriber
> >
> > Observation:
> > Pg_subscriber command is not completing and is stuck with following
> > log repeating:
> > LOG: waiting for WAL to become available at 0/3000168
> > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> >
>
> This is similar to the previous test and you can try the same option
> here as well.
For this test as well tried with -X stream option  in pg_basebackup.
It is getting stuck here as well with similar log.

Will investigate the issue further.


Thanks and regards
Shlok Kyal

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

05 January 2024, 10:06:11

Dear Euler,

I love your proposal, so I want to join the review. Here are my first comments.

01.
Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
We want users to execute pg_subscriber on the target, right?

02.
When the application was executed, many outputs filled my screen. Some of them
were by pg_subscriber, and others were server log. Can we record them into
separated file? I imagined like pg_upgrade.

03.
A replication command is used when replication slots are created. Is there a
reason to use it? I think we do not have to use logical replication walsender mode,
we can use an SQL function instead. pg_create_logical_replication_slot() also outputs
LSN, isn't it sufficient?

04.
As you know, there are several options for publications/subscriptions/replication
slots. Do you have a good way to specify them in your mind?

05.
I found that the connection string for each subscriptions have a setting
"fallback_application_name=pg_subscriber". Can we remove it?

```
postgres=# SELECT subconninfo FROM pg_subscription;
                                   subconninfo
---------------------------------------------------------------------------------
 user=postgres port=5431 fallback_application_name=pg_subscriber dbname=postgres
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Amit Kapila

Date:

08 January 2024, 05:29:54

On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I love your proposal, so I want to join the review. Here are my first comments.
>
> 01.
> Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> We want users to execute pg_subscriber on the target, right?
>

I don't see any harm in users giving those information but we should
have some checks to ensure that the server is in standby mode and is
running locally. The other related point is do we need to take input
for the target cluster directory from the user? Can't we fetch that
information once we are connected to standby?

>
> 05.
> I found that the connection string for each subscriptions have a setting
> "fallback_application_name=pg_subscriber". Can we remove it?
>
> ```
> postgres=# SELECT subconninfo FROM pg_subscription;
>                                    subconninfo
> ---------------------------------------------------------------------------------
>  user=postgres port=5431 fallback_application_name=pg_subscriber dbname=postgres
> (1 row)
> ```

Can that help distinguish the pg_subscriber connection on the publisher?

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

08 January 2024, 07:05:21

Dear Amit,

> On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > I love your proposal, so I want to join the review. Here are my first comments.
> >
> > 01.
> > Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> > We want users to execute pg_subscriber on the target, right?
> >
> 
> I don't see any harm in users giving those information but we should
> have some checks to ensure that the server is in standby mode and is
> running locally. The other related point is do we need to take input
> for the target cluster directory from the user? Can't we fetch that
> information once we are connected to standby?

I think that functions like inet_client_addr() may be able to use, but it returns
NULL only when the connection is via a Unix-domain socket. Can we restrict
pg_subscriber to use such a socket?

> >
> > 05.
> > I found that the connection string for each subscriptions have a setting
> > "fallback_application_name=pg_subscriber". Can we remove it?
> >
> > ```
> > postgres=# SELECT subconninfo FROM pg_subscription;
> >                                    subconninfo
> >
> ---------------------------------------------------------------------------------
> >  user=postgres port=5431 fallback_application_name=pg_subscriber
> dbname=postgres
> > (1 row)
> > ```
> 
> Can that help distinguish the pg_subscriber connection on the publisher?
>

Note that this connection string is used between the publisher instance and the
subscriber instance (not pg_subscriber client application). Also, the
fallback_application_name would be replaced to the name of subscriber in
run_apply_worker()->walrcv_connect(). Actually the value would not be used.
See below output on publisher.

```
publisher=# SELECT application_name, backend_type FROM pg_stat_activity where backend_type = 'walsender';
   application_name   | backend_type 
----------------------+--------------
 pg_subscriber_5_9411 | walsender
(1 row)
```

Or, if you mean to say that this can distinguish whether the subscription is used
by pg_subscriber or not. I think it is sufficient the current format of name.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Amit Kapila

Date:

09 January 2024, 05:19:28

On Mon, Jan 8, 2024 at 12:35 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > I love your proposal, so I want to join the review. Here are my first comments.
> > >
> > > 01.
> > > Should we restrict that `--subscriber-conninfo` must not have hostname or IP?
> > > We want users to execute pg_subscriber on the target, right?
> > >
> >
> > I don't see any harm in users giving those information but we should
> > have some checks to ensure that the server is in standby mode and is
> > running locally. The other related point is do we need to take input
> > for the target cluster directory from the user? Can't we fetch that
> > information once we are connected to standby?
>
> I think that functions like inet_client_addr() may be able to use, but it returns
> NULL only when the connection is via a Unix-domain socket. Can we restrict
> pg_subscriber to use such a socket?
>

Good question. So, IIUC, this tool has a requirement to run locally
where standby is present because we want to write reconvery.conf file.
I am not sure if it is a good idea to have a restriction to use only
the unix domain socket as users need to set up the standby for that by
configuring unix_socket_directories. It is fine if we can't ensure
that it is running locally but we should at least ensure that the
server is a physical standby node to avoid the problems as Shlok has
reported.

On a related point, I see that the patch stops the standby server (if
it is running) before starting with subscriber-side steps. I was
wondering if users can object to it that there was some important data
replication in progress which this tool has stopped. Now, OTOH,
anyway, once the user uses pg_subscriber, the standby server will be
converted to a subscriber, so it may not be useful as a physical
replica. Do you or others have any thoughts on this matter?

> > >
> > > 05.
> > > I found that the connection string for each subscriptions have a setting
> > > "fallback_application_name=pg_subscriber". Can we remove it?
> > >
> > > ```
> > > postgres=# SELECT subconninfo FROM pg_subscription;
> > >                                    subconninfo
> > >
> > ---------------------------------------------------------------------------------
> > >  user=postgres port=5431 fallback_application_name=pg_subscriber
> > dbname=postgres
> > > (1 row)
> > > ```
> >
> > Can that help distinguish the pg_subscriber connection on the publisher?
> >
>
> Note that this connection string is used between the publisher instance and the
> subscriber instance (not pg_subscriber client application). Also, the
> fallback_application_name would be replaced to the name of subscriber in
> run_apply_worker()->walrcv_connect(). Actually the value would not be used.
>

Fair point. It is not clear what other purpose this can achieve,
probably Euler has something in mind for this.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

09 January 2024, 07:01:41

Dear Amit,

> > > I don't see any harm in users giving those information but we should
> > > have some checks to ensure that the server is in standby mode and is
> > > running locally. The other related point is do we need to take input
> > > for the target cluster directory from the user? Can't we fetch that
> > > information once we are connected to standby?
> >
> > I think that functions like inet_client_addr() may be able to use, but it returns
> > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > pg_subscriber to use such a socket?
> >
> 
> Good question. So, IIUC, this tool has a requirement to run locally
> where standby is present because we want to write reconvery.conf file.
> I am not sure if it is a good idea to have a restriction to use only
> the unix domain socket as users need to set up the standby for that by
> configuring unix_socket_directories. It is fine if we can't ensure
> that it is running locally but we should at least ensure that the
> server is a physical standby node to avoid the problems as Shlok has
> reported.

While thinking more about it, I found that we did not define the policy
whether user must not connect to the target while running pg_subscriber. What
should be? If it should be avoided, some parameters like listen_addresses and
unix_socket_permissions should be restricted like start_postmaster() in
pg_upgrade/server.c. Also, the port number should be changed to another value
as well.

Personally, I vote to reject connections during the pg_subscriber.

> On a related point, I see that the patch stops the standby server (if
> it is running) before starting with subscriber-side steps. I was
> wondering if users can object to it that there was some important data
> replication in progress which this tool has stopped. Now, OTOH,
> anyway, once the user uses pg_subscriber, the standby server will be
> converted to a subscriber, so it may not be useful as a physical
> replica. Do you or others have any thoughts on this matter?

I assumed that connections should be closed before running pg_subscriber. If so,
it may be better to just fail the command when the physical standby has already
been started. There is no answer  whether data replication and user queries
should stop. Users should choose the stop option based on their policy and then
pg_subscriebr can start postmaster.
pg_upgrade does the same thing in setup().

====

Further comment:
According to the doc, currently pg_subscriber is listed in the client application.
But based on the definition, I felt it should be at "PostgreSQL Server Applications"
page. How do you think? The definition is:

>
This part contains reference information for PostgreSQL server applications and
support utilities. These commands can only be run usefully on the host where the
database server resides. Other utility programs are listed in PostgreSQL Client
Applications.
>

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Amit Kapila

Date:

09 January 2024, 09:25:37

On Tue, Jan 9, 2024 at 12:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > > > I don't see any harm in users giving those information but we should
> > > > have some checks to ensure that the server is in standby mode and is
> > > > running locally. The other related point is do we need to take input
> > > > for the target cluster directory from the user? Can't we fetch that
> > > > information once we are connected to standby?
> > >
> > > I think that functions like inet_client_addr() may be able to use, but it returns
> > > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > > pg_subscriber to use such a socket?
> > >
> >
> > Good question. So, IIUC, this tool has a requirement to run locally
> > where standby is present because we want to write reconvery.conf file.
> > I am not sure if it is a good idea to have a restriction to use only
> > the unix domain socket as users need to set up the standby for that by
> > configuring unix_socket_directories. It is fine if we can't ensure
> > that it is running locally but we should at least ensure that the
> > server is a physical standby node to avoid the problems as Shlok has
> > reported.
>
> While thinking more about it, I found that we did not define the policy
> whether user must not connect to the target while running pg_subscriber. What
> should be? If it should be avoided, some parameters like listen_addresses and
> unix_socket_permissions should be restricted like start_postmaster() in
> pg_upgrade/server.c.
>

Yeah, this makes sense to me.

> Also, the port number should be changed to another value
> as well.
>

Fair point, but I think in that case we should take this as one of the
parameters.

> Personally, I vote to reject connections during the pg_subscriber.
>
> > On a related point, I see that the patch stops the standby server (if
> > it is running) before starting with subscriber-side steps. I was
> > wondering if users can object to it that there was some important data
> > replication in progress which this tool has stopped. Now, OTOH,
> > anyway, once the user uses pg_subscriber, the standby server will be
> > converted to a subscriber, so it may not be useful as a physical
> > replica. Do you or others have any thoughts on this matter?
>
> I assumed that connections should be closed before running pg_subscriber. If so,
> it may be better to just fail the command when the physical standby has already
> been started. There is no answer  whether data replication and user queries
> should stop. Users should choose the stop option based on their policy and then
> pg_subscriebr can start postmaster.
> pg_upgrade does the same thing in setup().
>

Agreed.

> ====
>
> Further comment:
> According to the doc, currently pg_subscriber is listed in the client application.
> But based on the definition, I felt it should be at "PostgreSQL Server Applications"
> page. How do you think?
>

I also think it should be a server application.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

10 January 2024, 04:33:51

On Fri, 5 Jan 2024 at 12:19, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 4 Jan 2024 at 16:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Hi,
> > > I was testing the patch with following test cases:
> > >
> > > Test 1 :
> > > - Create a 'primary' node
> > > - Setup physical replica using pg_basebackup  "./pg_basebackup –h
> > > localhost –X stream –v –R –W –D ../standby "
> > > - Insert data before and after pg_basebackup
> > > - Run pg_subscriber and then insert some data to check logical
> > > replication "./pg_subscriber –D ../standby -S “host=localhost
> > > port=9000 dbname=postgres” -P “host=localhost port=9000
> > > dbname=postgres” -d postgres"
> > > - Also check pg_publication, pg_subscriber and pg_replication_slots tables.
> > >
> > > Observation:
> > > Data is not lost. Replication is happening correctly. Pg_subscriber is
> > > working as expected.
> > >
> > > Test 2:
> > > - Create a 'primary' node
> > > - Use normal pg_basebackup but don’t set up Physical replication
> > > "./pg_basebackup –h localhost –v –W –D ../standby"
> > > - Insert data before and after pg_basebackup
> > > - Run pg_subscriber
> > >
> > > Observation:
> > > Pg_subscriber command is not completing and is stuck with following
> > > log repeating:
> > > LOG: waiting for WAL to become available at 0/3000168
> > > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> > >
> >
> > I think probably the required WAL is not copied. Can you use the -X
> > option to stream WAL as well and then test? But I feel in this case
> > also, we should wait for some threshold time and then exit with
> > failure, removing new objects created, if any.
>
> I have tested with -X stream option in pg_basebackup as well. In this
> case also the pg_subscriber command is getting stuck.
> logs:
> 2024-01-05 11:49:34.436 IST [61948] LOG:  invalid resource manager ID
> 102 at 0/3000118
> 2024-01-05 11:49:34.436 IST [61948] LOG:  waiting for WAL to become
> available at 0/3000130
>
> >
> > > Test 3:
> > > - Create a 'primary' node
> > > - Use normal pg_basebackup but don’t set up Physical replication
> > > "./pg_basebackup –h localhost –v –W –D ../standby"
> > > -Insert data before pg_basebackup but not after pg_basebackup
> > > -Run pg_subscriber
> > >
> > > Observation:
> > > Pg_subscriber command is not completing and is stuck with following
> > > log repeating:
> > > LOG: waiting for WAL to become available at 0/3000168
> > > LOG: invalid record length at 0/3000150: expected at least 24, got 0
> > >
> >
> > This is similar to the previous test and you can try the same option
> > here as well.
> For this test as well tried with -X stream option  in pg_basebackup.
> It is getting stuck here as well with similar log.
>
> Will investigate the issue further.

I noticed that the pg_subscriber get stuck when we run it on node
which is not a standby. It is because the of the code:
+   conn = connect_database(dbinfo[0].pubconninfo);
+   if (conn == NULL)
+       exit(1);
+   consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0],
+                                                    temp_replslot);
+
.....
+else
+   {
+       appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n",
+                         consistent_lsn);
+       WriteRecoveryConfig(conn, subscriber_dir, recoveryconfcontents);
+   }

Here the standby node would be waiting for the 'consistent_lsn' wal
during recovery but this wal will not be present on standby if no
physical replication is setup. Hence the command will be waiting
infinitely for the wal.
To solve this added a timeout of 60s for the recovery process and also
added a check so that pg_subscriber would give a error when it called
for node which is not in physical replication.
Have attached the patch for the same. It is a top-up patch of the
patch shared by Euler at [1].

Please review the changes and merge the changes if it looks ok.

[1] - https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com

Thanks and regards
Shlok Kyal

Attachment

v1-0001-Restrict-pg_subscriber-to-standby-node.patch

Re: speed up a logical replica setup

From

vignesh C

Date:

10 January 2024, 09:39:53

On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
>
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
>    connection issue.
> 2. remove the physical replication slot if the standby is using one
>    (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
>    stop the replication between the nodes and remove the replication setup
>    (publications, subscriptions, replication slots). Or even include another
>    action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.

Few comments:
1) We should not allow specifying the same database name twice as we
will try to create the slots multiple times in the publisher, this can
be detected while parsing the options and error can be thrown:
+                       case 'd':
+
simple_string_list_append(&database_names, optarg);
+                               num_dbs++;
+                               break;

+static bool
+create_all_logical_replication_slots(LogicalRepInfo *dbinfo)
+{
+       int                     i;
+
+       for (i = 0; i < num_dbs; i++)
+       {
+               PGconn     *conn;
+               PGresult   *res;
+               char            replslotname[NAMEDATALEN];
....
....
....
+               /* Create replication slot on publisher. */
+               if (create_logical_replication_slot(conn, &dbinfo[i],
replslotname) != NULL || dry_run)
+                       pg_log_info("create replication slot \"%s\" on
publisher", replslotname);
+               else
+                       return false;
+
+               disconnect_database(conn);
+       }

E.g.: pg_subscriber -d postgres -d postgres

2) 2023 should be changed to 2024
+/*-------------------------------------------------------------------------
+ *
+ * pg_subscriber.c
+ *       Create a new logical replica from a standby server
+ *
+ * Copyright (C) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *             src/bin/pg_subscriber/pg_subscriber.c
+ *
+ *-------------------------------------------------------------------------
+ */

3) Similarly here too:
diff --git a/src/bin/pg_basebackup/t/040_pg_subscriber.pl
b/src/bin/pg_basebackup/t/040_pg_subscriber.pl
new file mode 100644
index 0000000000..9d20847dc2
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_subscriber.pl
@@ -0,0 +1,44 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+#
+# Test checking options of pg_subscriber.
+#

4) Similarly here too:
diff --git a/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
b/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
new file mode 100644
index 0000000000..ce25608c68
--- /dev/null
+++ b/src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+#
+# Test using a standby server as the subscriber.

Regards,
Vignesh

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

11 January 2024, 02:29:14

On Wed, Jan 10, 2024, at 1:33 AM, Shlok Kyal wrote:

Here the standby node would be waiting for the 'consistent_lsn' wal
during recovery but this wal will not be present on standby if no
physical replication is setup. Hence the command will be waiting
infinitely for the wal.

Hmm. Some validations are missing.

To solve this added a timeout of 60s for the recovery process and also
added a check so that pg_subscriber would give a error when it called
for node which is not in physical replication.
Have attached the patch for the same. It is a top-up patch of the
patch shared by Euler at [1].

If the user has a node that is not a standby and it does not set the GUCs to

start the recovery process from a backup, the initial setup is broken. (That's

the case you described.) A good UI is to detect this scenario earlier.

Unfortunately, there isn't a reliable and cheap way to do it. You need to start

the recovery and check if it is having some progress. (I don't have a strong

opinion about requiring a standby to use this tool. It would reduce the

complexity about checking if the setup has all requirements to run this tool.)

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

11 January 2024, 06:41:03

On Thu, Jan 11, 2024 at 7:59 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 10, 2024, at 1:33 AM, Shlok Kyal wrote:
>
> Here the standby node would be waiting for the 'consistent_lsn' wal
> during recovery but this wal will not be present on standby if no
> physical replication is setup. Hence the command will be waiting
> infinitely for the wal.
>
>
> Hmm. Some validations are missing.
>
> To solve this added a timeout of 60s for the recovery process and also
> added a check so that pg_subscriber would give a error when it called
> for node which is not in physical replication.
> Have attached the patch for the same. It is a top-up patch of the
> patch shared by Euler at [1].
>
>
> If the user has a node that is not a standby and it does not set the GUCs to
> start the recovery process from a backup, the initial setup is broken. (That's
> the case you described.) A good UI is to detect this scenario earlier.
> Unfortunately, there isn't a reliable and cheap way to do it. You need to start
> the recovery and check if it is having some progress. (I don't have a strong
> opinion about requiring a standby to use this tool. It would reduce the
> complexity about checking if the setup has all requirements to run this tool.)
>

Right, such a check will reduce some complexity. So, +1 for the check
as proposed by Shlok. Also, what are your thoughts on a timeout during
the wait? I think it is okay to wait for 60s by default but there
should be an option for users to wait for longer.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

11 January 2024, 12:18:01

Dear hackers,

I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.
I also tried to address some comments which is not so claimed by others.
They were included in 0003 patch.

* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

* 0002 patch
This contains small fixes to keep complier quiet.

* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

SOURCE: [1]

2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

SOURCE: [2]

3.
Options -o/-O were added to specify options for publications/subscriptions.

SOURCE: [2]

4.
Made standby to save their output to log file.

SOURCE: [2]

5.
Unnecessary Assert in drop_replication_slot() was removed.

SOURCE: [3]

How do you think?
Thanks Shlok and Vignesh to work with me offline.

[1]:
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CALDaNm098Jkbh%2Bye6zMj9Ro9j1bBe6FfPV80BFbs1%3DpUuTJ07g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

11 January 2024, 22:15:22

On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:

I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.

Let me send an updated patch to hopefully keep the CF bot happy. The following

items are included in this patch:

* drop physical replication slot if standby is using one [1].

* cleanup small changes (copyright, .gitignore) [2][3]

* fix getopt_long() options [2]

* fix format specifier for some messages

* move doc to Server Application section [4]

* fix assert failure

* ignore duplicate database names [2]

* store subscriber server log into a separate file

* remove MSVC support

I'm still addressing other reviews and I'll post another version that includes

it soon.

[1] https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com

[2] https://www.postgresql.org/message-id/CALDaNm1joke42n68LdegN5wCpaeoOMex2EHcdZrVZnGD3UhfNQ%40mail.gmail.com

[3] https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com

[4] https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v5-0001-Creates-a-new-logical-replica-from-a-standby-serv.patch

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

11 January 2024, 23:00:18

On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:

I have been concerned that the patch has not been tested by cfbot due to the
application error. Also, some comments were raised. Therefore, I created a patch
to move forward.
I also tried to address some comments which is not so claimed by others.
They were included in 0003 patch.

[I removed the following part in the previous email and couldn't reply to it...]

* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

v5 removes the MSVC support.

* 0002 patch
This contains small fixes to keep complier quiet.

I applied it. Although, I used a different approach for format specifier.

* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

I didn't review all items but ...

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

My first reaction as a new user would be: why do I need to specify a port if my

--subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do

it behind the scenes. Try a range of ports.

2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

Extra protection is always good. However, let's make sure this code path is

really useful. I'll think a bit about it.

3.
Options -o/-O were added to specify options for publications/subscriptions.

Flexibility is cool. However, I think the cost benefit of it is not good. You

have to parse the options to catch preliminary errors. Things like publish only

delete and subscription options that conflicts with the embedded ones are

additional sources of failure.

4.
Made standby to save their output to log file.

It was already done in v5. I did in a different way.

5.
Unnecessary Assert in drop_replication_slot() was removed.

Instead, I fixed the code and keep the assert.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

12 January 2024, 03:35:54

On Fri, Jan 12, 2024 at 4:30 AM Euler Taveira <euler@eulerto.com> wrote:
>
>
> 3.
> Options -o/-O were added to specify options for publications/subscriptions.
>
>
> Flexibility is cool. However, I think the cost benefit of it is not good. You
> have to parse the options to catch preliminary errors. Things like publish only
> delete and subscription options that conflicts with the embedded ones are
> additional sources of failure.
>

Yeah, I am also not sure we need those. Did we discussed about that
previously? OTOH, we may consider to enhance this tool later if we
have user demand for such options.

BTW, I think we need some way to at least drop the existing
subscriptions otherwise the newly created subscriber will attempt to
fetch the data which may not be intended. Ashutosh made an argument
above thread that we need an option for publications as well.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

12 January 2024, 05:08:57

Dear Amit, Euler,

> >
> >
> > 3.
> > Options -o/-O were added to specify options for publications/subscriptions.
> >
> >
> > Flexibility is cool. However, I think the cost benefit of it is not good. You
> > have to parse the options to catch preliminary errors. Things like publish only
> > delete and subscription options that conflicts with the embedded ones are
> > additional sources of failure.
> >
> 
> Yeah, I am also not sure we need those. Did we discussed about that
> previously? OTOH, we may consider to enhance this tool later if we
> have user demand for such options.

OK, so let's drop it once and consider later as an enhancement.
As Euler said, it leads additional ERRORs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

12 January 2024, 06:32:33

Dear Euler,

Sorry for disturbing your work and thanks for updates.
I will review your patch again.

>
* 0001 patch
It is almost the same as v3-0001, which was posted by Euler.
An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

v5 removes the MSVC support.
>

Confirmed that the patch could be applied.

>
* 0002 patch
This contains small fixes to keep complier quiet.

I applied it. Although, I used a different approach for format specifier.
>

Good, all warnings were removed. However, the patch failed to pass tests on FreeBSD twice.
 I'm quite not sure the ERROR, but is it related with us?

>
* 0003 patch
This addresses comments posted to -hackers. For now, this does not contain a doc.
Will add if everyone agrees these idea.

I didn't review all items but ...

1.
An option --port was added to control the port number for physical standby.
Users can specify a port number via the option, or an environment variable PGSUBPORT.
If not specified, a fixed value (50111) would be used.

My first reaction as a new user would be: why do I need to specify a port if my
--subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
it behind the scenes. Try a range of ports.
>

My initial motivation of the setting was to avoid establishing connections
during the pg_subscriber. While considering more, I started to think that
--subscriber-conninfo may not be needed. pg_upgrade does not requires the
string: it requries username, and optionally port number (which would be used
during the upgrade) instead. The advantage of this approach is that we do not
have to parse the connection string.
How do you think?

>
2.
A FATAL error would be raised if --subscriber-conninfo specifies non-local server.

Extra protection is always good. However, let's make sure this code path is
really useful. I'll think a bit about it.
>

OK, I can wait your consideration. Note that if we follow the pg_ugprade style,
we may able to reuse check_pghost_envvar().

>
3.
Options -o/-O were added to specify options for publications/subscriptions.

Flexibility is cool. However, I think the cost benefit of it is not good. You
have to parse the options to catch preliminary errors. Things like publish only
delete and subscription options that conflicts with the embedded ones are
additional sources of failure.
>

As I already replied, let's stop doing it once. We can resume based on the requirement.

>
4.
Made standby to save their output to log file.

It was already done in v5. I did in a different way.
>

Good. I felt that yours were better. BTW, can we record outputs by pg_subscriber to a file as well?
pg_upgrade did similar thing. Thought?

>
5.
Unnecessary Assert in drop_replication_slot() was removed.

Instead, I fixed the code and keep the assert.
>

Cool.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

12 January 2024, 10:31:01

Dear Euler,

Here are comments for your v5 patch.

01.
In the document, two words target/standby are used as almost the same meaning.
Can you unify them?

02.
```
 <refsynopsisdiv>
  <cmdsynopsis>
   <command>pg_subscriber</command>
   <arg rep="repeat"><replaceable>option</replaceable></arg>
  </cmdsynopsis>
 </refsynopsisdiv>
```

There are some mandatory options like -D/-S/-P. It must be listed in Synopsis chapter.

03.
```
  <para>
   <application>pg_subscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a standby server and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>
```

I briefly checked other pages and they do not describe accepted options here.
A summary of the application should be mentioned. Based on that, how about:

```
pg_subscriber creates a new <link linkend="logical-replication-subscription">
subscriber</link> from a physical standby server. This allows users to quickly
set up logical replication system.
```

04.
```
  <para>
   The <application>pg_subscriber</application> should be run at the target
   server. The source server (known as publisher server) should accept logical
   replication connections from the target server (known as subscriber server).
   The target server should accept local logical replication connection.
  </para>
```

I'm not native speaker, but they are not just recommmendations - they are surely
required. So, should we replace s/should/has to/?

05.
```
     <varlistentry>
      <term><option>-S <replaceable class="parameter">conninfo</replaceable></option></term>
      <term><option>--subscriber-conninfo=<replaceable class="parameter">conninfo</replaceable></option></term>
      <listitem>
       <para>
        The connection string to the subscriber. For details see <xref linkend="libpq-connstring"/>.
       </para>
      </listitem>
     </varlistentry>
```

I became not sure whether it is "The connection string to the subscriber.".
The server is still physical standby at that time.

06.
```
 * IDENTIFICATION
 *        src/bin/pg_subscriber/pg_subscriber.c
```

The identification is not correct.

07.
I felt that there were too many global variables and LogicalRepInfo should be
refactored. Because...

* Some info related with clusters(e.g., subscriber_dir, conninfo, ...) should be
  gathered in one struct.
* pubconninfo/subsconninfo are stored per db, but it is not needed if we have
  one base_conninfo.
* pubname/subname are not needed because we have fixed naming rule.
* pg_ctl_path and pg_resetwal_path can be conbimed into one bindir.
* num_dbs should not be alone.
...

Based on above, how about using structures like below?

```
typedef struct LogicalRepPerdbInfo
{
    Oid        oid;
    char   *dbname;
    bool    made_replslot;    /* replication slot was created */
    bool    made_publication;    /* publication was created */
    bool    made_subscription;    /* subscription was created */
} LogicalRepPerdbInfo;

typedef struct PrimaryInfo
{
    char   *base_conninfo;
    bool    made_transient_replslot;
} PrimaryInfo;

typedef struct StandbyInfo
{
    char   *base_conninfo;
    char   *bindir;
    char   *pgdata;
    char   *primary_slot_name;
} StandbyInfo;

typedef struct LogicalRepInfo
{
    int                        num_dbs;
    LogicalRepPerdbInfo       *perdb;
    PrimaryInfo               *primary;
    StandbyInfo               *standby;
} LogicalRepInfo;
```

08.
```
    char       *subconninfo;    /* subscription connection string for logical
                                 * replication */
```

Not sure how we should notate because the target has not been subscriber yet.

09.
```
enum WaitPMResult
{
    POSTMASTER_READY,
    POSTMASTER_STANDBY,
    POSTMASTER_STILL_STARTING,
    POSTMASTER_FAILED
};
```

This enum has been already defined in pg_ctl.c. Not sure we can use the same name.
Can we rename to PGSWaitPMResult. or export pre-existing one?

10.
```
/* Options */
static const char *progname;
```

I think it is not an option.

11.
```
/*
 * Validate a connection string. Returns a base connection string that is a
 * connection string without a database name plus a fallback application name.
 * Since we might process multiple databases, each database name will be
 * appended to this base connection string to provide a final connection string.
 * If the second argument (dbname) is not null, returns dbname if the provided
 * connection string contains it. If option --database is not provided, uses
 * dbname as the only database to setup the logical replica.
 * It is the caller's responsibility to free the returned connection string and
 * dbname.
 */
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
```
Just FYI - adding fallback_application_name may be too optimisitic. Currently
the output was used by both pg_subscriber and subscription connection.

12.
Can we add an option not to remove log files even operations were succeeded.

13.
```
        /*
         * Since the standby server is running, check if it is using an
         * existing replication slot for WAL retention purposes. This
         * replication slot has no use after the transformation, hence, it
         * will be removed at the end of this process.
         */
        primary_slot_name = use_primary_slot_name();
```

Now primary_slot_name is checked only when the server have been started, but
it should be checked in any cases.

14.
```
    consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0],
                                                     temp_replslot);
```

Can we create a temporary slot here?

15.
I found that subscriptions cannot be started if tuples are inserted on publisher
after creating temp_replslot. After starting a subscriber, I got below output on the log.

```
ERROR:  could not receive data from WAL stream: ERROR:  publication "pg_subscriber_5" does not exist
CONTEXT:  slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change callback, associated LSN 0/30008A8
LOG:  background worker "logical replication apply worker" (PID 3669) exited with exit code 1
```

But this is strange. I confirmed that the specified publication surely exists.
Do you know the reason?

```
publisher=# SELECT pubname FROM pg_publication;
     pubname
-----------------
 pg_subscriber_5
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Nazir Bilal Yavuz

Date:

12 January 2024, 18:32:44

Hi,

On Fri, 12 Jan 2024 at 09:32, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Good, all warnings were removed. However, the patch failed to pass tests on FreeBSD twice.
>  I'm quite not sure the ERROR, but is it related with us?

FreeBSD errors started after FreeBSD's CI image was updated [1]. I do
not think error is related to this.

[1] https://cirrus-ci.com/task/4700394639589376

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: speed up a logical replica setup

From

Amit Kapila

Date:

13 January 2024, 11:02:47

On Fri, Jan 12, 2024 at 12:02 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I didn't review all items but ...
>
> 1.
> An option --port was added to control the port number for physical standby.
> Users can specify a port number via the option, or an environment variable PGSUBPORT.
> If not specified, a fixed value (50111) would be used.
>
> My first reaction as a new user would be: why do I need to specify a port if my
> --subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
> it behind the scenes. Try a range of ports.
> >
>
> My initial motivation of the setting was to avoid establishing connections
> during the pg_subscriber. While considering more, I started to think that
> --subscriber-conninfo may not be needed. pg_upgrade does not requires the
> string: it requries username, and optionally port number (which would be used
> during the upgrade) instead. The advantage of this approach is that we do not
> have to parse the connection string.
> How do you think?
>

+1. This seems worth considering. I think unless we have a good reason
to have this parameter, we should try to avoid it.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Junwang Zhao

Date:

14 January 2024, 01:42:46

Hi Euler,

On Fri, Jan 12, 2024 at 6:16 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
>
> I have been concerned that the patch has not been tested by cfbot due to the
> application error. Also, some comments were raised. Therefore, I created a patch
> to move forward.
>
>
> Let me send an updated patch to hopefully keep the CF bot happy. The following
> items are included in this patch:
>
> * drop physical replication slot if standby is using one [1].
> * cleanup small changes (copyright, .gitignore) [2][3]
> * fix getopt_long() options [2]
> * fix format specifier for some messages
> * move doc to Server Application section [4]
> * fix assert failure
> * ignore duplicate database names [2]
> * store subscriber server log into a separate file
> * remove MSVC support
>
> I'm still addressing other reviews and I'll post another version that includes
> it soon.
>
> [1] https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com
> [2] https://www.postgresql.org/message-id/CALDaNm1joke42n68LdegN5wCpaeoOMex2EHcdZrVZnGD3UhfNQ%40mail.gmail.com
> [3]
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [4]
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>

+ <refnamediv>
+  <refname>pg_subscriber</refname>
+  <refpurpose>create a new logical replica from a standby server</refpurpose>
+ </refnamediv>
I'm a bit confused about this wording because we are converting a standby
to a logical replica but not creating a new logical replica and leaving the
standby as is. How about:

<refpurpose>convert a standby replica to a logical replica</refpurpose>

+  <para>
+   The <application>pg_subscriber</application> should be run at the target
+   server. The source server (known as publisher server) should accept logical
+   replication connections from the target server (known as subscriber server).
+   The target server should accept local logical replication connection.
+  </para>

What is *local logical replication*? I can't find any clue in the patch, can you
give me some hint?


>
> --
> Euler Taveira
> EDB   https://www.enterprisedb.com/
>


--
Regards
Junwang Zhao

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

16 January 2024, 06:28:10

On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> >
> > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > On 08.11.23 00:12, Michael Paquier wrote:
> > >> - Should the subdirectory pg_basebackup be renamed into something more
> > >> generic at this point?  All these things are frontend tools that deal
> > >> in some way with the replication protocol to do their work.  Say
> > >> a replication_tools?
> > >
> > > Seems like unnecessary churn.  Nobody has complained about any of the other
> > > tools in there.
> >
> > Not sure.  We rename things across releases in the tree from time to
> > time, and here that's straight-forward.
> >
> >
> > Based on this discussion it seems we have a consensus that this tool should be
> > in the pg_basebackup directory. (If/when we agree with the directory renaming,
> > it could be done in a separate patch.) Besides this move, the v3 provides a dry
> > run mode. It basically executes every routine but skip when should do
> > modifications. It is an useful option to check if you will be able to run it
> > without having issues with connectivity, permission, and existing objects
> > (replication slots, publications, subscriptions). Tests were slightly improved.
> > Messages were changed to *not* provide INFO messages by default and --verbose
> > provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> > also refactored the connect_database() function into which the connection will
> > always use the logical replication mode. A bug was fixed in the transient
> > replication slot name. Ashutosh review [1] was included. The code was also indented.
> >
> > There are a few suggestions from Ashutosh [2] that I will reply in another
> > email.
> >
> > I'm still planning to work on the following points:
> >
> > 1. improve the cleanup routine to point out leftover objects if there is any
> >    connection issue.
> >
>
> I think this is an important part. Shall we try to write to some file
> the pending objects to be cleaned up? We do something like that during
> the upgrade.
>
> > 2. remove the physical replication slot if the standby is using one
> >    (primary_slot_name).
> > 3. provide instructions to promote the logical replica into primary, I mean,
> >    stop the replication between the nodes and remove the replication setup
> >    (publications, subscriptions, replication slots). Or even include another
> >    action to do it. We could add both too.
> >
> > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > UI for users that would like to use it.
> >
>
> Isn't point 2 also essential because how would otherwise such a slot
> be advanced or removed?
>
> A few other points:
> ==============
> 1. Previously, I asked whether we need an additional replication slot
> patch created to get consistent LSN and I see the following comment in
> the patch:
>
> + *
> + * XXX we should probably use the last created replication slot to get a
> + * consistent LSN but it should be changed after adding pg_basebackup
> + * support.
>
> Yeah, sure, we may want to do that after backup support and we can
> keep a comment for the same but I feel as the patch stands today,
> there is no good reason to keep it. Also, is there a reason that we
> can't create the slots after backup is complete and before we write
> recovery parameters
>
> 2.
> + appendPQExpBuffer(str,
> +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
>
> Shouldn't we enable two_phase by default for newly created
> subscriptions? Is there a reason for not doing so?
>
> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.
>
> 4. Can we see some numbers with various sizes of databases (cluster)
> to see how it impacts the time for small to large-size databases as
> compared to the traditional method? This might help us with giving
> users advice on when to use this tool. We can do this bit later as
> well when the patch is closer to being ready for commit.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 100MB, 1GB and 5GB data:
                            | 100MB | 1GB      | 5GB
Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
pg_subscriber     | 3.977s  | 9.988s   | 12.665s

Here, 'w' stands for 'workers'. I have included the tests to see the
test result variations with different values for
'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
tests for different data records; for 100MB I put  3,00,000 Records,
for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
It is observed that 'pg_subscriber' is better when the table size is
more.
Next I plan to run these tests for 10GB and 20GB to see if this trend
continues or not.
Attaching the script files which have the details of the test scripts
used and the excel file has the test run details. The
'pg_subscriber.pl' file is to test with 'pg_subscriber' and the
'logical_rep.pl' file is to test with 'Logical Replication'.

Thanks and Regards,
Shubham Khanna.

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

17 January 2024, 07:28:55

Dear Euler, hackers,

I found that some bugs which have been reported by Shlok were not fixed, so
made a top-up patch. 0001 was not changed, and 0002 contains below:

* Add a timeout option for the recovery option, per [1]. The code was basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
  (not sure it is really needed, so feel free reject the part.)

Feel free to merge parts of 0002 if it looks good to you.
Thanks Shlok to make a part of patch.

[1]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Tue, Jan 16, 2024 at 11:58 AM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> > >
> > > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > > On 08.11.23 00:12, Michael Paquier wrote:
> > > >> - Should the subdirectory pg_basebackup be renamed into something more
> > > >> generic at this point?  All these things are frontend tools that deal
> > > >> in some way with the replication protocol to do their work.  Say
> > > >> a replication_tools?
> > > >
> > > > Seems like unnecessary churn.  Nobody has complained about any of the other
> > > > tools in there.
> > >
> > > Not sure.  We rename things across releases in the tree from time to
> > > time, and here that's straight-forward.
> > >
> > >
> > > Based on this discussion it seems we have a consensus that this tool should be
> > > in the pg_basebackup directory. (If/when we agree with the directory renaming,
> > > it could be done in a separate patch.) Besides this move, the v3 provides a dry
> > > run mode. It basically executes every routine but skip when should do
> > > modifications. It is an useful option to check if you will be able to run it
> > > without having issues with connectivity, permission, and existing objects
> > > (replication slots, publications, subscriptions). Tests were slightly improved.
> > > Messages were changed to *not* provide INFO messages by default and --verbose
> > > provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> > > also refactored the connect_database() function into which the connection will
> > > always use the logical replication mode. A bug was fixed in the transient
> > > replication slot name. Ashutosh review [1] was included. The code was also indented.
> > >
> > > There are a few suggestions from Ashutosh [2] that I will reply in another
> > > email.
> > >
> > > I'm still planning to work on the following points:
> > >
> > > 1. improve the cleanup routine to point out leftover objects if there is any
> > >    connection issue.
> > >
> >
> > I think this is an important part. Shall we try to write to some file
> > the pending objects to be cleaned up? We do something like that during
> > the upgrade.
> >
> > > 2. remove the physical replication slot if the standby is using one
> > >    (primary_slot_name).
> > > 3. provide instructions to promote the logical replica into primary, I mean,
> > >    stop the replication between the nodes and remove the replication setup
> > >    (publications, subscriptions, replication slots). Or even include another
> > >    action to do it. We could add both too.
> > >
> > > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > > UI for users that would like to use it.
> > >
> >
> > Isn't point 2 also essential because how would otherwise such a slot
> > be advanced or removed?
> >
> > A few other points:
> > ==============
> > 1. Previously, I asked whether we need an additional replication slot
> > patch created to get consistent LSN and I see the following comment in
> > the patch:
> >
> > + *
> > + * XXX we should probably use the last created replication slot to get a
> > + * consistent LSN but it should be changed after adding pg_basebackup
> > + * support.
> >
> > Yeah, sure, we may want to do that after backup support and we can
> > keep a comment for the same but I feel as the patch stands today,
> > there is no good reason to keep it. Also, is there a reason that we
> > can't create the slots after backup is complete and before we write
> > recovery parameters
> >
> > 2.
> > + appendPQExpBuffer(str,
> > +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> > +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> > +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
> >
> > Shouldn't we enable two_phase by default for newly created
> > subscriptions? Is there a reason for not doing so?
> >
> > 3. How about sync slots on the physical standby if present? Do we want
> > to retain those as it is or do we need to remove those? We are
> > actively working on the patch [1] for the same.
> >
> > 4. Can we see some numbers with various sizes of databases (cluster)
> > to see how it impacts the time for small to large-size databases as
> > compared to the traditional method? This might help us with giving
> > users advice on when to use this tool. We can do this bit later as
> > well when the patch is closer to being ready for commit.
>
> I have done the Performance testing and attached the results to
> compare the 'Execution Time' between 'logical replication' and
> 'pg_subscriber' for 100MB, 1GB and 5GB data:
>                             | 100MB | 1GB      | 5GB
> Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
> Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
> Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
> Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
> pg_subscriber     | 3.977s  | 9.988s   | 12.665s
>
> Here, 'w' stands for 'workers'. I have included the tests to see the
> test result variations with different values for
> 'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
> tests for different data records; for 100MB I put  3,00,000 Records,
> for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
> It is observed that 'pg_subscriber' is better when the table size is
> more.
> Next I plan to run these tests for 10GB and 20GB to see if this trend
> continues or not.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 10GB and 20GB data:
                            | 10GB     | 20GB
Logical rep (2 w) | 157.131s| 343.191s
Logical rep (4 w) | 116.627s| 240.480s
Logical rep (8 w) | 95.237s  | 275.715s
Logical rep(10 w)| 92.792s  | 280.538s
pg_subscriber     | 22.734s  | 25.661s

As expected, we can see that pg_subscriber is very much better in
ideal cases with approximately 7x times better in case of 10GB and 13x
times better in case of 20GB.
I'm attaching the script files which have the details of the test
scripts used and the excel file has the test run details. The
'pg_subscriber.pl' file is for 'Streaming Replication' and the
'logical_replication.pl' file is for 'Logical Replication'.
Note: For 20GB the record count should be changed to 6,00,00,000 and
the 'max_sync_workers_per_subscription' needs to be adjusted for
different logical replication tests with different workers.

Thanks and Regards,
Shubham Khanna.

Dear Euler, hackers,

We fixed some of the comments posted in the thread. We have created it
as top-up patch 0002 and 0003.

0002 patch contains the following changes:
* Add a timeout option for the recovery option, per [1]. The code was
basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
  (not sure it is really needed, so feel free reject the part.)
* Add check for max_replication_slots and wal_level; as per [4]
* Add -u and -p options; as per [5]
* Addressed comment except 5 and 8 in [6] and comment in [7]

0003 patch contains fix for bug reported in [8].

Feel free to merge parts of 0002 and 0003 if it looks good to you.
Thanks Kuroda-san to make patch 0003 and a part of patch 0002.

[1]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4]: https://www.postgresql.org/message-id/CALDaNm098Jkbh%2Bye6zMj9Ro9j1bBe6FfPV80BFbs1%3DpUuTJ07g%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CAA4eK1JB_ko7a5JMS3WfAn583RadAKCDhiE9JgmfMA8ZZ5xcQw%40mail.gmail.com
[6]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[7]: https://www.postgresql.org/message-id/CANhcyEXjGmryoZPACS_i-joqvcz5e6Zb3u4g38SAy_iSTGhShg%40mail.gmail.com
[8]:
https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Thanks and regards
Shlok Kyal

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

23 January 2024, 02:11:26

Dear hackers,

> We fixed some of the comments posted in the thread. We have created it
> as top-up patch 0002 and 0003.

I found that the CFbot raised an ERROR. Also, it may not work well in case
of production build.
PSA the fixed patch set.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

23 January 2024, 23:44:15

On Mon, Jan 22, 2024, at 6:22 AM, Amit Kapila wrote:

On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> >
> > > Yet other options could be
> > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > > sounds like we are doing some work to create the subscriber which I
> > > think is the case here.
> >
> > I see your point here. pg_createsubscriber is not like createuser in
> > that it just runs an SQL command. It does something different than
> > CREATE SUBSCRIBER.

Right.

Subscriber has a different meaning of subscription. Subscription is an SQL

object. Subscriber is the server (node in replication terminology) where the

subscription resides. Having said that pg_createsubscriber doesn't seem a bad

name because you are creating a new subscriber. (Indeed, you are transforming /

converting but "create" seems closer and users can infer that it is a tool to

build a new logical replica.

So a different verb would make that clearer. Maybe
> > something from here: https://www.thesaurus.com/browse/convert
>
> I read the link and found a good verb "switch". So, how about using "pg_switchsubscriber"?
>

I also initially thought on these lines and came up with a name like
pg_convertsubscriber but didn't feel strongly about it as that would
have sounded meaningful if we use a name like
pg_convertstandbytosubscriber. Now, that has become too long. Having
said that, I am not opposed to it having a name on those lines. BTW,
another option that occurred to me today is pg_preparesubscriber. We
internally create slots and then wait for wal, etc. which makes me
sound like adding 'prepare' in the name can also explain the purpose.

I think "convert" and "transform" fit for this case. However, "create",

"convert" and "transform" have 6, 7 and 9 characters, respectively. I suggest

that we avoid long names (subscriber already has 10 characters). My preference

is pg_createsubscriber.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

23 January 2024, 23:58:11

On Mon, Jan 22, 2024, at 4:06 AM, Hayato Kuroda (Fujitsu) wrote:

I analyzed and found a reason. This is because publications are invisible for some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given LSN (got at step2).
=====pg_subscriber finished here=====
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at step1.
11. on primary, decoding was started but ERROR was raised.

Good catch! It is a design flaw.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.

I'll include a similar code in the next patch and also explain why we should

create the publication earlier. (I'm renaming

create_all_logical_replication_slots to setup_publisher and calling

create_publication from there and also adding the proposed GUC checks in it.)

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

24 January 2024, 01:29:09

On Mon, Jan 22, 2024, at 6:30 AM, Shlok Kyal wrote:

We fixed some of the comments posted in the thread. We have created it
as top-up patch 0002 and 0003.

Cool.

0002 patch contains the following changes:
* Add a timeout option for the recovery option, per [1]. The code was
basically ported from pg_ctl.c.
* Reject if the target server is not a standby, per [2]
* Raise FATAL error if --subscriber-conninfo specifies non-local server, per [3]
(not sure it is really needed, so feel free reject the part.)
* Add check for max_replication_slots and wal_level; as per [4]
* Add -u and -p options; as per [5]
* Addressed comment except 5 and 8 in [6] and comment in [7]

My suggestion is that you create separate patches for each change. It helps

with review and alternative proposals. Some of these items conflict with what I

have in my local branch and removing one of them is time consuming. For this

one, I did the job but let's avoid rework.

0003 patch contains fix for bug reported in [8].

LGTM. As I said in the other email, I included it.

I'll post a new one soon.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

24 January 2024, 06:11:01

On Tue, Jan 23, 2024 at 7:41 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > We fixed some of the comments posted in the thread. We have created it
> > as top-up patch 0002 and 0003.
>
> I found that the CFbot raised an ERROR. Also, it may not work well in case
> of production build.
> PSA the fixed patch set.

 Segmentation fault was found after testing the given command(There is
an extra '/' between 'new_standby2' and '-P') '$ gdb --args
./pg_subscriber -D ../new_standby2 / -P "host=localhost
port=5432 dbname=postgres" -d postgres'
While executing the above command, I got the following error:
pg_subscriber: error: too many command-line arguments (first is "/")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555557e5b in cleanup_objects_atexit () at pg_subscriber.c:173
173            if (perdb->made_subscription)
(gdb) p perdb
$1 = (LogicalRepPerdbInfo *) 0x0

Thanks and Regards,
Shubham Khanna.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

24 January 2024, 11:43:00

Dear Shubham,

>  Segmentation fault was found after testing the given command(There is
> an extra '/' between 'new_standby2' and '-P') '$ gdb --args
> ./pg_subscriber -D ../new_standby2 / -P "host=localhost
> port=5432 dbname=postgres" -d postgres'
> While executing the above command, I got the following error:
> pg_subscriber: error: too many command-line arguments (first is "/")
> pg_subscriber: hint: Try "pg_subscriber --help" for more information.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000555555557e5b in cleanup_objects_atexit () at pg_subscriber.c:173
> 173            if (perdb->made_subscription)
> (gdb) p perdb
> $1 = (LogicalRepPerdbInfo *) 0x0
>

Good catch, I could reproduce the issue. This crash was occurred because the
cleanup function was called before initialization memory.

There are several ways to fix it, but I chose to move the callback registration
behind. The function does actual tasks only after database objects are created.
So 0004 registers the function just before doing them. The memory allocation has
been done at that time. If required, Assert() can be added in the callback.

Can you test it and confirm the issue was solved?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

25 January 2024, 05:54:32

Dear hackers,

Based on the requirement, I have profiled the performance test. It showed bottlenecks
are in small-data case are mainly two - starting a server and waiting until the
recovery is done.

# Tested source code

V7 patch set was applied atop HEAD(0eb23285). No configure options were specified
when it was built.

# Tested workload

I focused on only 100MB/1GB cases because bigger ones have already had good performance.
(Number of inserted tuples were same as previous tests)
I used bash script instead of tap test framework. See attached. Executed SQLs and
operations were almost the same.

As you can see, I tested only one-db case. Results may be changed if the number
of databases were changed.

# Measurement
Some debug logs which output current time were added (please see diff file).
I picked up some events and done at before/after them. Below bullets showed the measured ones:

* Starting a server
* Stopping a server
* Creating replication slots
* Creating publications
* Waiting until the recovery ended
* Creating subscriptions

# Result 
Below table shows the elapsed time for these events. Raw data is also available
by the attached excel file.

|Event category                  |100MB case [sec]|1GB [sec]|
|Starting a server               |1.414           |1.417    |
|Stoping a server                |0.506           |0.506    |
|Creating replication slots      |0.005           |0.007    |
|Creating publications           |0.001           |0.002    |
|Waiting until the recovery ended|1.603           |14.529   |
|Creating subscriptions          |0.012           |0.012    |
|Total                           |3.541           |16.473   |
|actual time                     |4.37            |17.271   |


As you can see, starting servers and waiting seem slow. We cannot omit these,
but setting smaller shared_buffers will reduce the start time. One approach is
to overwrite the GUC to smaller value, but I think we cannot determine the
appropriate value.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote:

I'll post a new one soon.

I'm attaching another patch that fixes some of the issues pointed out by

Hayato, Shlok, and Junwang.

* publication doesn't exist. The analysis [1] was done by Hayato but I didn't

use the proposed patch. Instead I refactored the code a bit [2] and call it

from a new function (setup_publisher) that is called before the promotion.

* fix wrong path name in the initial comment [3]

* change terminology: logical replica -> physical replica [3]

* primary / standby is ready for logical replication? setup_publisher() and

setup_subscriber() check if required GUCs are set accordingly. For primary,

it checks wal_level = logical, max_replication_slots has remain replication

slots for the proposed setup and also max_wal_senders available. For standby,

it checks max_replication_slots for replication origin and also remain number

of background workers to start the subscriber.

* retain option: I extracted this one from Hayato's patch [4]

* target server must be a standby. It seems we agree that this restriction

simplifies the code a bit but can be relaxed in the future (if/when base

backup support is added.)

* recovery timeout option: I decided to include it but I think the use case is

too narrow. It helps in broken setups. However, it can be an issue in some

scenarios like time-delayed replica, large replication lag, slow hardware,

slow network. I didn't use the proposed patch [5]. Instead, I came up with a

simple one that defaults to forever. The proposed one defaults to 60 seconds

but I'm afraid that due to one of the scenarios I said in a previous

sentence, we cancel a legitimate case. Maybe we should add a message during

dry run saying that due to a replication lag, it will take longer to run.

* refactor primary_slot_name code. With the new setup_publisher and

setup_subscriber functions, I splitted the function that detects the

primary_slot_name use into 2 pieces just to avoid extra connections to have

the job done.

* remove fallback_application_name as suggested by Hayato [5] because logical

replication already includes one.

I'm still thinking about replacing --subscriber-conninfo with separate items

(username, port, password?, host = socket dir). Maybe it is an overengineering.

The user can always prepare the environment to avoid unwanted and/or external

connections.

I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I

didn't hear objections about it, I'll do it in the next patch.

[1] https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com

[2] https://www.postgresql.org/message-id/73ab86ca-3fd5-49b3-9c80-73d1525202f1%40app.fastmail.com

[3] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

[4] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

[5] https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v9-0001-Creates-a-new-logical-replica-from-a-standby-serv.patch

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 January 2024, 00:28:45

On Thu, Jan 25, 2024, at 6:05 AM, Hayato Kuroda (Fujitsu) wrote:

01.
```
/* Options */
static char *pub_conninfo_str = NULL;
static SimpleStringList database_names = {NULL, NULL};
static int wait_seconds = DEFAULT_WAIT;
static bool retain = false;
static bool dry_run = false;
```

Just to confirm - is there a policy why we store the specified options? If you
want to store as global ones, username and port should follow (my fault...).
Or, should we have a structure to store them?

It is a matter of style I would say. Check other client applications. Some of

them also use global variable. There are others that group options into a

struct. I would say that since it has a short lifetime, I don't think the

current style is harmful.

04.
```
{"dry-run", no_argument, NULL, 'n'},
```

I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
which value would be changed based on the input. As for the pg_upgrade, it checks
whether the node can be upgraded for now. I think, we should have the checking
feature, so it should be renamed to --check. Also, the process should exit earlier
at that time.

It is extremely useful because (a) you have a physical replication setup and

don't know if it is prepared for logical replication, (b) check GUCs (is

max_wal_senders sufficient for this pg_subscriber command? Or is

max_replication_slots sufficient to setup the logical replication even though I

already have some used replication slots?), (c) connectivity and (d)

credentials.

05.
I felt we should accept some settings from enviroment variables, like pg_upgrade.
Currently, below items should be acceted.

- data directory
- username
- port
- timeout

Maybe PGDATA.

06.
```
pg_logging_set_level(PG_LOG_WARNING);
```

If the default log level is warning, there are no ways to output debug logs.
(-v option only raises one, so INFO would be output)
I think it should be PG_LOG_INFO.

You need to specify multiple -v options.

07.
Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()?

I think v9 does it.

08.
Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?

09.
Not sure, but should we check max_slot_wal_keep_size of primary server? It can
avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?

Hmm. We might add another check that can be noticed during dry run mode.

10.
```
nslots_new = nslots_old + dbarr.ndbs;

if (nslots_new > max_replication_slots)
{
pg_log_error("max_replication_slots (%d) must be greater than or equal to "
"the number of replication slots required (%d)", max_replication_slots, nslots_new);
exit(1);
}
```

I think standby server must not have replication slots. Because subsequent
pg_resetwal command discards all the WAL file, so WAL records pointed by them
are removed. Currently pg_resetwal does not raise ERROR at that time.

Again, dry run mode might provide a message for it.

11.
```
/*
* Stop the subscriber if it is a standby server. Before executing the
* transformation steps, make sure the subscriber is not running because
* one of the steps is to modify some recovery parameters that require a
* restart.
*/
if (stat(pidfile, &statbuf) == 0)
```

I kept just in case, but I'm not sure it is still needed. How do you think?
Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?

12.
```
pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
standby.bindir, standby.pgdata);
rc = system(pg_ctl_cmd);
pg_ctl_status(pg_ctl_cmd, rc, 0);
```

There are two places to stop the instance. Can you divide it into a function?

Yes.

13.
```
* A temporary replication slot is not used here to avoid keeping a
* replication connection open (depending when base backup was taken, the
* connection should be open for a few hours).
*/
conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
if (conn == NULL)
exit(1);
consistent_lsn = create_logical_replication_slot(conn, true,
&dbarr.perdb[0]);
```

I didn't notice the comment, but still not sure the reason. Why we must reserve
the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
is created only for getting a consistent_lsn. So we do not have to keep.
Also, just before, logical replication slots for each databases are created, so
WAL records are surely reserved.

This comment needs to be updated. It was written at the time I was pursuing

base backup support too. It doesn't matter if you remove this transient

replication slot earlier because all of the replication slots created to the

subscriptions were created *before* the one for the consistent LSN. Hence, no

additional WAL retention due to this transient replication slot.

14.

```
pg_log_info("starting the subscriber");
start_standby_server(&standby, subport, server_start_log);
```

This info should be in the function.

Ok.

15.
```
/*
* Create a subscription for each database.
*/
for (i = 0; i < dbarr.ndbs; i++)
```

This can be divided into a function, like create_all_subscriptions().

Ok.

16.
My fault: usage() must be updated.

17. use_primary_slot_name
```
if (PQntuples(res) != 1)
{
pg_log_error("could not obtain replication slot information: got %d rows, expected %d row",
PQntuples(res), 1);
return NULL;
}
```

Error message should be changed. I think this error means the standby has wrong primary_slot_name, right?

I refactored this code a bit but the message is the same. It detects 2 cases:

(a) you set primary_slot_name but you don't have a replication slot with the

same name and (b) a cannot-happen bug that provides > 1 rows. It is a broken

setup so maybe a hint saying so is enough.

18. misc
Sometimes the pid of pg_subscriber is referred. It can be stored as global variable.

I prefer to keep getpid() call.

19.
C99-style has been allowed, so loop variables like "i" can be declared in the for-statement, like

```
for (int i = 0; i < MAX; i++)
```

v9 does it.

20.
Some comments, docs, and outputs must be fixed when the name is changed.

Next patch.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 January 2024, 05:51:49

Dear Euler,

Thanks for updating the patch! Before reading yours, I wanted to reply some of comments.

>
I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.
>

For me, required amount of fixes are not so different from current one. How about
others?

>
It is extremely useful because (a) you have a physical replication setup and
don't know if it is prepared for logical replication, (b) check GUCs (is
max_wal_senders sufficient for this pg_subscriber command? Or is
max_replication_slots sufficient to setup the logical replication even though I
already have some used replication slots?), (c) connectivity and (d)
credentials.
>

Yeah, it is useful for verification purpose, so let's keep this option.
But I still think the naming should be "--check". Also, there are many
`if (!dry_run)` but most of them can be removed if the process exits earlier.
Thought?


>
> 05.
> I felt we should accept some settings from enviroment variables, like pg_upgrade.
> Currently, below items should be acceted.
>
> - data directory
> - username
> - port
> - timeout

Maybe PGDATA.
>

Sorry, I cannot follow this. Did you mean that the target data directory should
be able to be specified by PGDATA? OF so, +1.

>
> 06.
> ```
> pg_logging_set_level(PG_LOG_WARNING);
> ```
>
> If the default log level is warning, there are no ways to output debug logs.
> (-v option only raises one, so INFO would be output)
> I think it should be PG_LOG_INFO.

You need to specify multiple -v options.
>

Hmm. I felt the specification was bit strange...but at least it must be
described on the documentation. pg_dump.sgml has similar lines.

>
> 08.
> Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
> should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?
>

This comment assumed outputs by pg_subscriber were also recorded to a file.
In this case and if the file also has the same timestamp, I think they can be
gathered in the same place. No need if outputs are not recorded.


>
> 09.
> Not sure, but should we check max_slot_wal_keep_size of primary server? It can
> avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?
Hmm. We might add another check that can be noticed during dry run mode.
>

I thought that we should not generate any broken objects, but indeed, not sure
it is our scope. How do other think?


>
> 11.
> ```
> /*
> * Stop the subscriber if it is a standby server. Before executing the
> * transformation steps, make sure the subscriber is not running because
> * one of the steps is to modify some recovery parameters that require a
> * restart.
> */
> if (stat(pidfile, &statbuf) == 0)
> ```
>
> I kept just in case, but I'm not sure it is still needed. How do you think?
> Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?
>

Running `pg_ctl stop` itself can detect whether the process has been still alive.
It would exit with 1 when the process is not there.

>
> I didn't notice the comment, but still not sure the reason. Why we must reserve
> the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
> is created only for getting a consistent_lsn. So we do not have to keep.
> Also, just before, logical replication slots for each databases are created, so
> WAL records are surely reserved.
>

I want to confirm the conclusion - will you remove the creation of a transient slot?

Also, not tested, I'm now considering that we can reuse the primary_conninfo value.
We are assuming that the target server is standby and the current upstream one will
convert to publisher. In this case, the connection string is already specified as
primary_conninfo so --publisher-conninfo may not be needed. The parameter does
not contain database name, so --databases is still needed. I imagine like:

1. Parse options
2. Turn on standby
3. Verify the standby
4. Turn off standby
5. Get primary_conninfo from standby
6. Connect to primary (concat of primary_conninfo and an option is used)
7. Verify the primary
...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 January 2024, 07:55:51

Dear Euler,

Again, thanks for updating the patch! There are my random comments for v9.

01.
I cannot find your replies for my comments#7 [1] but you reverted related changes.
I'm not sure you are still considering it or you decided not to include changes.
Can you clarify your opinion?
(It is needed because changes are huge so it quite affects other developments...)

02.
```
+       <term><option>-t <replaceable class="parameter">seconds</replaceable></option></term>
+       <term><option>--timeout=<replaceable class="parameter">seconds</replaceable></option></term>
```

But source codes required `--recovery-timeout`. Please update either of them,

03.
```
+ *    Create a new logical replica from a standby server
```

Junwang pointed out to change here but the change was reverted [2]
Can you clarify your opinion as well?

04.
```
+/*
+ * Is the source server ready for logical replication? If so, create the
+ * publications and replication slots in preparation for logical replication.
+ */
+static bool
+setup_publisher(LogicalRepInfo *dbinfo)
```

But this function verifies the source server. I felt they should be in the
different function.

05.
```
+/*
+ * Is the target server ready for logical replication?
+ */
+static bool
+setup_subscriber(LogicalRepInfo *dbinfo)
````

Actually, this function does not set up subscriber. It just verifies whether the
target can become a subscriber, right? If should be renamed.

06.
```
+    atexit(cleanup_objects_atexit);
```

The registration of the cleanup function is too early. This sometimes triggers
a core-dump. E.g.,

```
$ pg_subscriber --publisher-conninfo --subscriber-conninfo 'user=postgres port=5432' --verbose --database 'postgres'
--pgdatadata_N2/ 
pg_subscriber: error: too many command-line arguments (first is "user=postgres port=5432")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.
Segmentation fault (core dumped)

$ gdb ...
(gdb) bt
#0  cleanup_objects_atexit () at pg_subscriber.c:131
#1  0x00007fb982cffce9 in __run_exit_handlers () from /lib64/libc.so.6
#2  0x00007fb982cffd37 in exit () from /lib64/libc.so.6
#3  0x00000000004054e6 in main (argc=9, argv=0x7ffc59074158) at pg_subscriber.c:1500
(gdb) f 3
#3  0x00000000004054e6 in main (argc=9, argv=0x7ffc59074158) at pg_subscriber.c:1500
1500                    exit(1);
(gdb) list
1495            if (optind < argc)
1496            {
1497                    pg_log_error("too many command-line arguments (first is \"%s\")",
1498                                             argv[optind]);
1499                    pg_log_error_hint("Try \"%s --help\" for more information.", progname);
1500                    exit(1);
1501            }
1502
1503            /*
1504             * Required arguments
```

I still think it should be done just before the creation of objects [3].

07.
Missing a removal of publications on the standby.

08.
Missing registration of LogicalRepInfo in the typedefs.list.

09
```
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_subscriber</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
```

Can you reply my comment#2 [4]? I think mandatory options should be written.

10.
Just to confirm - will you implement start_standby/stop_standby functions in next version?

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/CAEG8a3%2BwL_2R8n12BmRz7yBP3EBNdHDhmdgxQFA9vS%2BzPR%2B3Kw%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

29 January 2024, 01:10:45

On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:

Again, thanks for updating the patch! There are my random comments for v9.

Thanks for checking v9. I already incorporated some of the points below into

the next patch. Give me a couple of hours to include some important points.

01.
I cannot find your replies for my comments#7 [1] but you reverted related changes.
I'm not sure you are still considering it or you decided not to include changes.
Can you clarify your opinion?
(It is needed because changes are huge so it quite affects other developments...)

It is still on my list. As I said in a previous email I'm having a hard time

reviewing pieces from your 0002 patch because you include a bunch of things

into one patch.

02.
```
+ <term><option>-t <replaceable class="parameter">seconds</replaceable></option></term>
+ <term><option>--timeout=<replaceable class="parameter">seconds</replaceable></option></term>
```

But source codes required `--recovery-timeout`. Please update either of them,

Oops. Fixed. My preference is --recovery-timeout because someone can decide to

include a --timeout option for this tool.

03.
```
+ * Create a new logical replica from a standby server
```

Junwang pointed out to change here but the change was reverted [2]
Can you clarify your opinion as well?

I'll review the documentation once I fix the code. Since the tool includes the

*create* verb into its name, it seems strange use another verb (convert) in the

description. Maybe we should just remove the word *new* and a long description

explains the action is to turn the standby (physical replica) into a logical

replica.

04.
```
+/*
+ * Is the source server ready for logical replication? If so, create the
+ * publications and replication slots in preparation for logical replication.
+ */
+static bool
+setup_publisher(LogicalRepInfo *dbinfo)
```

But this function verifies the source server. I felt they should be in the
different function.

I split setup_publisher() into check_publisher() and setup_publisher().

05.
```
+/*
+ * Is the target server ready for logical replication?
+ */
+static bool
+setup_subscriber(LogicalRepInfo *dbinfo)
````

Actually, this function does not set up subscriber. It just verifies whether the
target can become a subscriber, right? If should be renamed.

I renamed setup_subscriber() -> check_subscriber().

06.
```
+ atexit(cleanup_objects_atexit);
```

The registration of the cleanup function is too early. This sometimes triggers
a core-dump. E.g.,

I forgot to apply the atexit() patch.

07.
Missing a removal of publications on the standby.

It was on my list to do. It will be in the next patch.

08.
Missing registration of LogicalRepInfo in the typedefs.list.

Done.

09
```
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_subscriber</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
```

Can you reply my comment#2 [4]? I think mandatory options should be written.

I included the mandatory options into the synopsis.

10.
Just to confirm - will you implement start_standby/stop_standby functions in next version?

It is still on my list.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

29 January 2024, 09:57:44

Dear Euler,

>
It is still on my list. As I said in a previous email I'm having a hard time
reviewing pieces from your 0002 patch because you include a bunch of things
...
It is still on my list.
>

I understood that patches we posted were bad. Sorry for inconvenience.
So I extracted changes only related with them. Can you review them and include
If it seems OK?

v10-0001: same as v9-0001.
0002: not related with our changes, but this fixed a small bug.
          The third argument of getopt_long was not correct.
0003: adds start_standby/stop_standby funcitons
0004: tries to refactor some structures.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

30 January 2024, 02:01:38

On Sun, Jan 28, 2024, at 10:10 PM, Euler Taveira wrote:

On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:
Again, thanks for updating the patch! There are my random comments for v9.

Thanks for checking v9. I already incorporated some of the points below into
the next patch. Give me a couple of hours to include some important points.

Here it is another patch that includes the following changes:

* rename the tool to pg_createsubscriber: it was the name with the most votes

[1].

* fix recovery-timeout option [2]

* refactor: split setup_publisher into check_publisher (that contains only GUC

checks) and setup_publisher (that does what the name suggests) [2]

* doc: verbose option can be specified multiple times [2]

* typedefs.list: add LogicalRepInfo [2]

* fix: register cleanup routine after the data structure (dbinfo) is assigned

[2]

* doc: add mandatory options to synopsis [3]

* refactor: rename some options (such as publisher-conninfo to

publisher-server) to use the same pattern as pg_rewind.

* feat: remove publications from subscriber [2]

* feat: use temporary replication slot to get consistent LSN [3]

* refactor: move subscriber setup to its own function

* refactor: stop standby server to its own function [2]

* refactor: start standby server to its own function [2]

* fix: getopt options [4]

There is a few open items in my list. I included v10-0002. I had already

included a refactor to include start/stop functions so I didn't include

v10-0003. I'll check v10-0004 tomorrow.

One open item that is worrying me is how to handle the pg_ctl timeout. This

patch does nothing and the user should use PGCTLTIMEOUT environment variable to

avoid that the execution is canceled after 60 seconds (default for pg_ctl).

Even if you set a high value, it might not be enough for cases like

time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout

option. I'll include this caveat into the documentation but I'm afraid it is

not sufficient and we should provide a better way to handle this situation.

[1] https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

[2] https://www.postgresql.org/message-id/TY3PR01MB98895A551923953B3DA3C7C8F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com

[3] https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2@TY3PR01MB9889.jpnprd01.prod.outlook.com

[4] https://www.postgresql.org/message-id/TY3PR01MB98891E7735141FE8760CEC4AF57E2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v11-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

30 January 2024, 09:26:59

Dear Euler,

Thanks for updating the patch!

> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. Thought?

Below part contains my comments for v11-0001. Note that the ordering is random.

01. doc
```
    <group choice="req">
     <arg choice="plain"><option>-D</option> </arg>
     <arg choice="plain"><option>--pgdata</option></arg>
    </group>
```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

02. doc
```
  <para>
   <application>pg_createsubscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>

```

I found that you did not include my suggestion without saying [1]. Do you dislike
the comment or still considering?

03. doc
```
      <term><option>-P  <replaceable class="parameter">connstr</replaceable></option></term>
```

Too many blank after -P.

04. doc
```
    <para>
     <application>pg_createsubscriber</application> checks if the target data
     directory is used by a physical replica. Stop the physical replica if it is
     running. One of the next steps is to add some recovery parameters that
     requires a server start. This step avoids an error.
    </para>
```

I think doc is not updated. Currently (not sure it is good) the physical replica
must be running before doing pg_createsubscriber.

05. doc
```
     each specified database on the source server. The replication slot name
     contains a <literal>pg_createsubscriber</literal> prefix. These replication
     slots will be used by the subscriptions in a future step.  A temporary
```

According to the "Replication Slot Management" subsection in logical-replication.sgml,
the format of names should be described expressly, e.g.:

```
These replication slots have generated names:"pg_createsubscriber_%u_%d" (parameters: Subscription oid, Process id
pid).
```

Publications, a temporary slot, and subscriptions should be described as well.

06. doc
```
     Next, <application>pg_createsubscriber</application> creates one publication
     for each specified database on the source server. Each publication
     replicates changes for all tables in the database. The publication name

```

Missing update. Publications are created before creating replication slots.

07. general
I think there are some commenting conversions in PG, but this file breaks it.

* Comments must be one-line as much as possible
* Periods must not be added when the comment is one-line.
* Initial character must be capital.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().


09. LogicalRepInfo
```
    char       *subconninfo;    /* subscription connection string for logical
                                 * replication */
```

As I posted in comment#8[2], I don't think it is "subscription connection". Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
        /*
         * If --database option is not provided, try to obtain the dbname from
         * the publisher conninfo. If dbname parameter is not available, error
         * out.
         */

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

11. check_data_directory
```
/*
 * Is it a cluster directory? These are preliminary checks. It is far from
 * making an accurate check. If it is not a clone from the publisher, it will
 * eventually fail in a future step.
 */
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a directory.
GetDataDirectoryCreatePerm() verifies it.

12. main
```
    /* Register a function to clean up objects in case of failure. */
    atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
due to signals:

> Functions  registered  using atexit() (and on_exit(3)) are not called if a
> process terminates abnormally because of the delivery of a signal.

Do you have a good way to handle the case? One solution is to output created
objects in any log level, but the consideration may be too much. Thought?

13, main
```
    /*
     * Create a temporary logical replication slot to get a consistent LSN.
```

Just to clarify - I still think the process exits before here in case of dry run.
In case of pg_resetwal, the process exits before doing actual works like
RewriteControlFile().

14. main
```
     * XXX we might not fail here. Instead, we provide a warning so the user
     * eventually drops this replication slot later.
```

But there are possibilities to exit(1) in drop_replication_slot(). Is it acceptable?

15. wait_for_end_recovery
```
        /*
         * Bail out after recovery_timeout seconds if this option is set.
         */
        if (recovery_timeout > 0 && timer >= recovery_timeout)
```

Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your mind?

16. main
```
    /*
     * Create the subscription for each database on subscriber. It does not
     * enable it immediately because it needs to adjust the logical
     * replication start point to the LSN reported by consistent_lsn (see
     * set_replication_progress). It also cleans up publications created by
     * this tool and replication to the standby.
     */
    if (!setup_subscriber(dbinfo, consistent_lsn))
```

Subscriptions would be created and replication origin would be moved forward here,
but latter one can be done only by the superuser. I felt that this should be
checked in check_subscriber().

17. main
```
    /*
     * Change system identifier.
     */
    modify_sysid(pg_resetwal_path, subscriber_dir);
```

Even if I executed without -v option, an output from pg_resetwal command appears.
It seems bit strange.

```
$ pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -S "port=5432 user=postgres" -d postgres
Write-ahead log reset
$
```

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

31 January 2024, 12:52:23

Dear Euler,

I extracted some review comments which may require many efforts. I hope this makes them
easy to review.

0001: not changed from yours.
0002: avoid to use replication connections. Source: comment #3[1]
0003: Remove -P option and use primary_conninfo instead. Source: [2]
0004: Exit earlier when dry_run is specified. Source: [3]
0005: Refactor data structures. Source: [4]

[1]:
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]:
https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4]:
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

31 January 2024, 13:03:54

On Thu, Jan 18, 2024 at 6:19 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Very early in this thread, someone mentioned the name
> pg_create_subscriber, and of course there is pglogical_create_subscriber
> as the historical predecessor. Something along those lines seems better
> to me. Maybe there are other ideas.
>

I've mentioned it upthread because of this pet project [1] that is one of the motivations behind upstream this facility.

[1] https://github.com/fabriziomello/pg_create_subscriber

Fabrízio de Royes Mello

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

31 January 2024, 14:25:06

On Wed, Jan 31, 2024 at 9:52 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> I extracted some review comments which may require many efforts. I hope this makes them
> easy to review.
>
> 0001: not changed from yours.
> 0002: avoid to use replication connections. Source: comment #3[1]
> 0003: Remove -P option and use primary_conninfo instead. Source: [2]
> 0004: Exit earlier when dry_run is specified. Source: [3]
> 0005: Refactor data structures. Source: [4]
>
> [1]: https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [2]: https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/message-id/TY3PR01MB98897C85700C6DF942D2D0A3F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [4]: https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>

Hey folks,

Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:

~/pgsql took 19s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434'
pg_createsubscriber: error: could not create subscription "pg_createsubscriber_16384_695617" on database "fabrizio": ERROR: syntax error at or near "/"
LINE 1: ..._16384_695617 CONNECTION 'user=fabrizio passfile='/home/fabr...
^
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_16384_695617" on database "fabrizio":
pg_createsubscriber: error: could not drop replication slot "pg_subscriber_695617_startpoint" on database "fabrizio": ERROR: replication slot "pg_subscriber_695617_startpoint" does not exist

And the LOG contains the following:

~/pgsql took 12s
✦ ➜ cat /tmp/replica5434/pg_createsubscriber_output.d/server_start_20240131T110318.730.log
2024-01-31 11:03:19.138 -03 [695632] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-01-31 11:03:19.138 -03 [695632] LOG: listening on IPv6 address "::1", port 5434
2024-01-31 11:03:19.138 -03 [695632] LOG: listening on IPv4 address "127.0.0.1", port 5434
2024-01-31 11:03:19.158 -03 [695632] LOG: listening on Unix socket "/tmp/.s.PGSQL.5434"
2024-01-31 11:03:19.179 -03 [695645] LOG: database system was shut down in recovery at 2024-01-31 11:03:18 -03
2024-01-31 11:03:19.180 -03 [695645] LOG: entering standby mode
2024-01-31 11:03:19.192 -03 [695645] LOG: redo starts at 0/4000028
2024-01-31 11:03:19.198 -03 [695645] LOG: consistent recovery state reached at 0/504DB08
2024-01-31 11:03:19.198 -03 [695645] LOG: invalid record length at 0/504DB08: expected at least 24, got 0
2024-01-31 11:03:19.198 -03 [695632] LOG: database system is ready to accept read-only connections
2024-01-31 11:03:19.215 -03 [695646] LOG: started streaming WAL from primary at 0/5000000 on timeline 1
2024-01-31 11:03:29.587 -03 [695645] LOG: recovery stopping after WAL location (LSN) "0/504F260"
2024-01-31 11:03:29.587 -03 [695645] LOG: redo done at 0/504F260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 10.39 s
2024-01-31 11:03:29.587 -03 [695645] LOG: last completed transaction was at log time 2024-01-31 11:03:18.761544-03
2024-01-31 11:03:29.587 -03 [695646] FATAL: terminating walreceiver process due to administrator command
2024-01-31 11:03:29.598 -03 [695645] LOG: selected new timeline ID: 2
2024-01-31 11:03:29.680 -03 [695645] LOG: archive recovery complete
2024-01-31 11:03:29.690 -03 [695643] LOG: checkpoint starting: end-of-recovery immediate wait
2024-01-31 11:03:29.795 -03 [695643] LOG: checkpoint complete: wrote 51 buffers (0.3%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.021 s, sync=0.034 s, total=0.115 s; sync files=17, longest=0.011 s, average=0.002 s; distance=16700 kB, estimate=16700 kB; lsn=0/504F298, redo lsn=0/504F298
2024-01-31 11:03:29.805 -03 [695632] LOG: database system is ready to accept connections
2024-01-31 11:03:30.332 -03 [695658] ERROR: syntax error at or near "/" at character 90
2024-01-31 11:03:30.332 -03 [695658] STATEMENT: CREATE SUBSCRIPTION pg_createsubscriber_16384_695617 CONNECTION 'user=fabrizio passfile='/home/fabrizio/.pgpass' channel_binding=prefer host=localhost port=5432 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=fabrizio' PUBLICATION pg_createsubscriber_16384 WITH (create_slot = false, copy_data = false, enabled = false)

Seems we need to escape connection params similar we do in dblink [1]

Regards,

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/dblink/dblink.c;h=19a362526d21dff5d8b1cdc68b15afebe7d40249;hb=HEAD#l2882

Fabrízio de Royes Mello

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

31 January 2024, 14:34:57

On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:

Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:

Try v11. It seems v12-0002 is not correct.

Seems we need to escape connection params similar we do in dblink [1]

I think it is a consequence of v12-0003. I didn't review v12 yet but although I

have added a comment saying it might be possible to use primary_conninfo, I'm

not 100% convinced that's the right direction.

* TODO use primary_conninfo (if available) from subscriber and

* extract publisher connection string. Assume that there are

* identical entries for physical and logical replication. If there is

* not, we would fail anyway.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

31 January 2024, 14:55:59

On Wed, Jan 31, 2024 at 11:35 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
>
> Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:
>
>
> Try v11. It seems v12-0002 is not correct.

Using v11 I'm getting this error:

~/pgsql took 22s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434' -P 'host=/tmp port=5432'
NOTICE: changed the failover state of replication slot "pg_createsubscriber_16384_706609" on publisher to false
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_706609_startpoint" on database "fabrizio": ERROR: replication slot "pg_createsubscriber_706609_startpoint" does not exist
Write-ahead log reset

Attached the output log.

Regards,

Fabrízio de Royes Mello

Attachment

server_start_20240131T115324.735.log

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

31 January 2024, 15:37:52

On Wed, Jan 31, 2024, at 11:55 AM, Fabrízio de Royes Mello wrote:

On Wed, Jan 31, 2024 at 11:35 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
>
> Jumping into this a bit late here... I'm trying a simple pg_createsubscriber but getting an error:
>
>
> Try v11. It seems v12-0002 is not correct.

Using v11 I'm getting this error:

~/pgsql took 22s
✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp port=5434' -P 'host=/tmp port=5432'
NOTICE: changed the failover state of replication slot "pg_createsubscriber_16384_706609" on publisher to false
pg_createsubscriber: error: could not drop replication slot "pg_createsubscriber_706609_startpoint" on database "fabrizio": ERROR: replication slot "pg_createsubscriber_706609_startpoint" does not exist
Write-ahead log reset

Hmm. I didn't try it with the failover patch that was recently applied. Did you

have any special configuration on primary?

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

31 January 2024, 17:39:28

On Wed, Jan 31, 2024 at 12:38 PM Euler Taveira <euler@eulerto.com> wrote:
>
>
> Hmm. I didn't try it with the failover patch that was recently applied. Did you
> have any special configuration on primary?
>

Nothing special, here the configurations I've changed after bootstrap:

port = '5432'
wal_level = 'logical'
max_wal_senders = '8'
max_replication_slots = '6'
hot_standby_feedback = 'on'
max_prepared_transactions = '10'
max_locks_per_transaction = '512'

Regards,

--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

01 February 2024, 00:04:55

On Tue, Jan 30, 2024, at 6:26 AM, Hayato Kuroda (Fujitsu) wrote:

> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. Thought?

I didn't provide the whole explanation. I'm envisioning the use case that pg_ctl

doesn't reach the consistent state and the timeout is reached (the consequence

is that pg_createsubscriber aborts the execution). It might occur on a busy

server. The probability that it occurs with the current code is low (LSN gap

for recovery is small). Maybe I'm anticipating issues when the base backup

support is added but better to raise concerns during development.

Below part contains my comments for v11-0001. Note that the ordering is random.

Hayato, thanks for reviewing v11.

01. doc
```
    <group choice="req">
     <arg choice="plain"><option>-D</option> </arg>
     <arg choice="plain"><option>--pgdata</option></arg>
    </group>
```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

pg_upgrade doesn't but others do like pg_rewind, pg_resetwal, pg_controldata,

pg_checksums. It seems newer tools tend to provide short and long options.

02. doc
```
<para>
   <application>pg_createsubscriber</application> takes the publisher and subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
</para>

```

I found that you did not include my suggestion without saying [1]. Do you dislike
the comment or still considering?

Documentation is on my list. I didn't fix the documentation since some design

decisions were changed. I'm still working on it.

03. doc
```
<term><option>-P <replaceable class="parameter">connstr</replaceable></option></term>
```

Too many blank after -P.

Fixed.

[documentation related items will be addressed later...]

07. general
I think there are some commenting conversions in PG, but this file breaks it.

It is on my list.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().

Done. I kept a few pg_log_error() + exit() because there is no

pg_fatal_and_hint() function.

09. LogicalRepInfo
```
char *subconninfo; /* subscription connection string for logical
* replication */
```

As I posted in comment#8[2], I don't think it is "subscription connection". Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

Done.

s/publication/publisher/

s/subscription/subscriber/

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
/*
* If --database option is not provided, try to obtain the dbname from
* the publisher conninfo. If dbname parameter is not available, error
* out.
*/

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

Why not? Unique database is a common setup. It is unintended if you don't

document it accordingly. I'll make sure it is advertised in the --database and

the --publisher-server options.

11. check_data_directory
```
/*
* Is it a cluster directory? These are preliminary checks. It is far from
* making an accurate check. If it is not a clone from the publisher, it will
* eventually fail in a future step.
*/
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a directory.
GetDataDirectoryCreatePerm() verifies it.

Good point. It is included in the next patch.

12. main
```
/* Register a function to clean up objects in case of failure. */
atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
due to signals:

> Functions registered using atexit() (and on_exit(3)) are not called if a
> process terminates abnormally because of the delivery of a signal.

Do you have a good way to handle the case? One solution is to output created
objects in any log level, but the consideration may be too much. Thought?

Nothing? If you interrupt the execution, there will be objects left behind and

you, as someone that decided to do it, have to clean things up. What do you

expect this tool to do? The documentation will provide some guidance informing

the object name patterns this tool uses and you can check for leftover objects.

Someone can argue that is a valid feature request but IMO it is not one in the

top of the list.

13, main
```
/*
* Create a temporary logical replication slot to get a consistent LSN.
```

Just to clarify - I still think the process exits before here in case of dry run.
In case of pg_resetwal, the process exits before doing actual works like
RewriteControlFile().

Why? Are you suggesting that the dry run mode covers just the verification

part? If so, it is not a dry run mode. I would expect it to run until the end

(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,

the modification is one of the last steps and the other ones (KillFoo

functions) that are skipped modify data. It ends the dry run mode when it

accomplish its goal (obtain the new control data values). If we stop earlier,

some of the additional steps won't be covered by the dry run mode and a failure

can happen but could be detected if you run a few more steps.

14. main
```
* XXX we might not fail here. Instead, we provide a warning so the user
* eventually drops this replication slot later.
```

But there are possibilities to exit(1) in drop_replication_slot(). Is it acceptable?

No, there isn't.

15. wait_for_end_recovery
```
/*
* Bail out after recovery_timeout seconds if this option is set.
*/
if (recovery_timeout > 0 && timer >= recovery_timeout)
```

Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your mind?

Why? See [1]. I prefer the kind mode (always wait until the recovery ends) but

you and Amit are proposing a more aggressive mode. The proposal (-t 60) seems

ok right now, however, if the goal is to provide base backup support in the

future, you certainly should have to add the --recovery-timeout in big clusters

or those with high workloads because base backup is run between replication slot

creation and consistent LSN. Of course, we can change the default when base

backup support is added.

16. main
```
/*
* Create the subscription for each database on subscriber. It does not
* enable it immediately because it needs to adjust the logical
* replication start point to the LSN reported by consistent_lsn (see
* set_replication_progress). It also cleans up publications created by
* this tool and replication to the standby.
*/
if (!setup_subscriber(dbinfo, consistent_lsn))
```

Subscriptions would be created and replication origin would be moved forward here,
but latter one can be done only by the superuser. I felt that this should be
checked in check_subscriber().

Good point. I included a check for pg_create_subscription role and CREATE

privilege on the specified database.

17. main
```
/*
* Change system identifier.
*/
modify_sysid(pg_resetwal_path, subscriber_dir);
```

Even if I executed without -v option, an output from pg_resetwal command appears.
It seems bit strange.

The pg_resetwal is using a printf and there is no prefix that identifies that

message is from pg_resetwal. That's message has been bothering me for a while

so let's send it to /dev/null. I'll include it in the next patch.

RewriteControlFile();

KillExistingXLOG();

KillExistingArchiveStatus();

KillExistingWALSummaries();

WriteEmptyXLOG();

printf(_("Write-ahead log reset\n"));

return 0;

[1] https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

01 February 2024, 01:47:11

Dear Fabrízio,

Thanks for reporting. I understood that the issue occurred on v11 and v12.
I will try to reproduce and check the reason.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

01 February 2024, 02:09:55

Dear Euler,

Thanks for giving comments! I want to reply some of them.

>
I didn't provide the whole explanation. I'm envisioning the use case that pg_ctl
doesn't reach the consistent state and the timeout is reached (the consequence
is that pg_createsubscriber aborts the execution). It might occur on a busy
server. The probability that it occurs with the current code is low (LSN gap
for recovery is small). Maybe I'm anticipating issues when the base backup
support is added but better to raise concerns during development.
>

Hmm, actually I didn't know the case. Thanks for explanation. I want to see
how you describe on the doc.

>
pg_upgrade doesn't but others do like pg_rewind, pg_resetwal, pg_controldata,
pg_checksums. It seems newer tools tend to provide short and long options.
>

Oh, you are right.

>
Nothing? If you interrupt the execution, there will be objects left behind and
you, as someone that decided to do it, have to clean things up. What do you
expect this tool to do? The documentation will provide some guidance informing
the object name patterns this tool uses and you can check for leftover objects.
Someone can argue that is a valid feature request but IMO it is not one in the
top of the list.
>

OK, so let's keep current style.

>
Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.
>

Yes, it was my expectation. I'm still not sure which operations can detect by the
dry_run, but we can keep it for now.

>
Why? See [1]. I prefer the kind mode (always wait until the recovery ends) but
you and Amit are proposing a more aggressive mode. The proposal (-t 60) seems
ok right now, however, if the goal is to provide base backup support in the
future, you certainly should have to add the --recovery-timeout in big clusters
or those with high workloads because base backup is run between replication slot
creation and consistent LSN. Of course, we can change the default when base
backup support is added.
>

Sorry, I was missing your previous post. Let's keep yours.

>
Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.
>

Not sure, but can we do the replication origin functions by these privilege?
According to the doc[1], these ones seem not to be related.

[1]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

01 February 2024, 03:05:23

On Wed, Jan 31, 2024, at 11:09 PM, Hayato Kuroda (Fujitsu) wrote:

>
Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.
>

Yes, it was my expectation. I'm still not sure which operations can detect by the
dry_run, but we can keep it for now.

The main goal is to have information for troubleshooting.

Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.
>

Not sure, but can we do the replication origin functions by these privilege?
According to the doc[1], these ones seem not to be related.

Hmm. No. :( Better add this check too.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

01 February 2024, 03:26:25

Dear Fabrízio, Euler,

I think you set the primary_slot_name to the standby server, right?
While reading codes, I found below line in v11-0001.
```
    if (primary_slot_name != NULL)
    {
        conn = connect_database(dbinfo[0].pubconninfo);
        if (conn != NULL)
        {
            drop_replication_slot(conn, &dbinfo[0], temp_replslot);
        }
```

Now the temp_replslot is temporary one, so it would be removed automatically.
This function will cause the error: replication slot "pg_createsubscriber_%u_startpoint" does not exist.
Also, the physical slot is still remained on the primary. 
In short, "temp_replslot" should be "primary_slot_name".

PSA a script file for reproducing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

test_0201.sh

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

01 February 2024, 12:47:06

Dear Fabrízio, Euler,

I made fix patches to solve reported issues by Fabrízio.

* v13-0001: Same as v11-0001 made by Euler.
* v13-0002: Fixes ERRORs while dropping replication slots [1].
           If you want to see codes which we get agreement, please apply until 0002.
=== experimental patches ===
* v13-0003: Avoids to use replication connections. The issue [2] was solved on my env.
* v13-0004: Removes -P option and use primary_conninfo instead.
* v13-0005: Refactors data structures

[1]: https://www.postgresql.org/message-id/CAFcNs%2BrSG9DcEewsoA%3D85DXhSRh%2BnyKrrcr64FEDytcZf6QaEQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAFcNs%2BpPtw%2By7Be00BK0MBpHhLk2s66tLM286g%3Dk5rew8kUxjg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

02 February 2024, 02:04:15

On Thu, Feb 1, 2024, at 9:47 AM, Hayato Kuroda (Fujitsu) wrote:

I made fix patches to solve reported issues by Fabrízio.

* v13-0001: Same as v11-0001 made by Euler.
* v13-0002: Fixes ERRORs while dropping replication slots [1].
If you want to see codes which we get agreement, please apply until 0002.
=== experimental patches ===
* v13-0003: Avoids to use replication connections. The issue [2] was solved on my env.
* v13-0004: Removes -P option and use primary_conninfo instead.
* v13-0005: Refactors data structures

Thanks for rebasing the proposed patches. I'm attaching a new patch.

As I said in the previous email [1] I fixed some issues from your previous review:

* use pg_fatal() if possible. There are some cases that it couldn't replace

pg_log_error() + exit(1) because it requires a hint.

* pg_resetwal output. Send standard output to /dev/null to avoid extra message.

* check privileges. Make sure the current role can execute CREATE SUBSCRIPTION

and pg_replication_origin_advance().

* log directory. Refactor code that setup the log file used as server log.

* run with restricted token (Windows).

* v13-0002. Merged.

* v13-0003. I applied a modified version. I returned only the required

information for each query.

* group command-line options into a new struct CreateSubscriberOptions. The

exception is the dry_run option.

* rename functions that obtain system identifier.

WIP

I'm still working on the data structures to group options. I don't like the way

it was grouped in v13-0005. There is too many levels to reach database name.

The setup_subscriber() function requires the 3 data structures.

The documentation update is almost there. I will include the modifications in

the next patch.

Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.

However, it comes with a restriction that requires a similar HBA rule for both

regular and replication connections. Is it an acceptable restriction? We might

paint ourselves into the corner. A reasonable proposal is not to remove this

option. Instead, it should be optional. If it is not provided, primary_conninfo

is used.

[1] https://www.postgresql.org/message-id/80ce3f65-7ca3-471e-b1a4-24ac529ff4ea%40app.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v14-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

02 February 2024, 09:41:35

Dear Euler,

Thanks for updating the patch!

>
I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.
>

Right, your refactoring looks fewer stack. So I pause to revise my refactoring
patch.

>
The documentation update is almost there. I will include the modifications in
the next patch.
>

OK. I think it should be modified before native speakers will attend to the
thread.

>
Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.
>

I didn't have such a point of view. However, it is not related whether -P exists
or not. Even v14-0001 requires primary to accept both normal/replication connections.
If we want to avoid it, the connection from pg_createsubscriber can be restored
to replication-connection.
(I felt we do not have to use replication protocol even if we change the connection mode)

The motivation why -P is not needed is to ensure the consistency of nodes.
pg_createsubscriber assumes that the -P option can connect to the upstream node,
but no one checks it. Parsing two connection strings may be a solution but be
confusing. E.g., what if some options are different?
I think using a same parameter is a simplest solution.

And below part contains my comments for v14.

01.
```
char        temp_replslot[NAMEDATALEN] = {0};
```

I found that no one refers the name of temporary slot. Can we remove the variable?

02.
```
    CreateSubscriberOptions opt;
...
    memset(&opt, 0, sizeof(CreateSubscriberOptions));

    /* Default settings */
    opt.subscriber_dir = NULL;
    opt.pub_conninfo_str = NULL;
    opt.sub_conninfo_str = NULL;
    opt.database_names = (SimpleStringList)
    {
        NULL, NULL
    };
    opt.retain = false;
    opt.recovery_timeout = 0;
```

Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
All values are set to 0x0.

03.
```
/*
 * Is the standby server ready for logical replication?
 */
static bool
check_subscriber(LogicalRepInfo *dbinfo)
```

You said "target server must be a standby" in [1], but I cannot find checks for it.
IIUC, there are two approaches:
 a) check the existence "standby.signal" in the data directory
 b) call an SQL function "pg_is_in_recovery"

04.
```
static char *pg_ctl_path = NULL;
static char *pg_resetwal_path = NULL;
```

I still think they can be combined as "bindir".

05.

```
    /*
     * Write recovery parameters.
...
        WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
```

WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
sure it is good. These settings would remain on new subscriber even after the
pg_createsubscriber. Can we avoid it? I come up with passing these parameters
via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
(all GUCs must be allign like "-c XXX -c XXX -c XXX...").

06.
```
static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char
*sub_base_conninfo);
...
static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
...
static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
```

Functions arguments should not be struct because they are passing by value.
They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
we can pass a value which would be really used.

07.
```
static char *get_base_conninfo(char *conninfo, char *dbname,
                               const char *noderole);
```

Not sure noderole should be passed here. It is used only for the logging.
Can we output string before calling the function?
(The parameter is not needed anymore if -P is removed)

08.
The terminology is still not consistent. Some functions call the target as standby,
but others call it as subscriber.

09.
v14 does not work if the standby server has already been set recovery_target*
options. PSA the reproducer. I considered two approaches:

 a) raise an ERROR when these parameter were set. check_subscriber() can do it
 b) overwrite these GUCs as empty strings. 

10.
The execution always fails if users execute --dry-run just before. Because
pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
use-case, so current implementation seems not user-friendly. How should we fix?
Below bullets are my idea:

 a) avoid stopping the standby in case of dry_run: seems possible.
 b) accept even if the standby is stopped: seems possible.
 c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?

My top-up patches fixes some issues.

v15-0001: same as v14-0001
=== experimental patches ===
v15-0002: Use replication connections when we connects to the primary.
          Connections to standby is not changed because the standby/subscriber
          does not require such type of connection, in principle.
          If we can accept connecting to subscriber with replication mode,
          this can be simplified.
v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
v15-0005: Avoid stopping/starting standby server in dry_run mode.
          I.e., approach a). in #10 is used.
v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.

[1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

06 February 2024, 06:40:47

On Fri, Feb 2, 2024 at 3:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> Thanks for updating the patch!
>
> >
> I'm still working on the data structures to group options. I don't like the way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
>
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.
>
> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
>
> OK. I think it should be modified before native speakers will attend to the
> thread.
>
> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, primary_conninfo
> is used.
> >
>
> I didn't have such a point of view. However, it is not related whether -P exists
> or not. Even v14-0001 requires primary to accept both normal/replication connections.
> If we want to avoid it, the connection from pg_createsubscriber can be restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the connection mode)
>
> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.
>
> And below part contains my comments for v14.
>
> 01.
> ```
> char            temp_replslot[NAMEDATALEN] = {0};
> ```
>
> I found that no one refers the name of temporary slot. Can we remove the variable?
>
> 02.
> ```
>         CreateSubscriberOptions opt;
> ...
>         memset(&opt, 0, sizeof(CreateSubscriberOptions));
>
>         /* Default settings */
>         opt.subscriber_dir = NULL;
>         opt.pub_conninfo_str = NULL;
>         opt.sub_conninfo_str = NULL;
>         opt.database_names = (SimpleStringList)
>         {
>                 NULL, NULL
>         };
>         opt.retain = false;
>         opt.recovery_timeout = 0;
> ```
>
> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.
>
> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
>
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"
>
> 04.
> ```
> static char *pg_ctl_path = NULL;
> static char *pg_resetwal_path = NULL;
> ```
>
> I still think they can be combined as "bindir".
>
> 05.
>
> ```
>         /*
>          * Write recovery parameters.
> ...
>                 WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
> ```
>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").
>
> 06.
> ```
> static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char
*sub_base_conninfo);
> ...
> static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
> ...
> static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
> ```
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
> we can pass a value which would be really used.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>                                                            const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
> 08.
> The terminology is still not consistent. Some functions call the target as standby,
> but others call it as subscriber.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
>  a) raise an ERROR when these parameter were set. check_subscriber() can do it
>  b) overwrite these GUCs as empty strings.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
> use-case, so current implementation seems not user-friendly. How should we fix?
> Below bullets are my idea:
>
>  a) avoid stopping the standby in case of dry_run: seems possible.
>  b) accept even if the standby is stopped: seems possible.
>  c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?
>
> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>           Connections to standby is not changed because the standby/subscriber
>           does not require such type of connection, in principle.
>           If we can accept connecting to subscriber with replication mode,
>           this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>           I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com


While reviewing the v15 patches I discovered that subscription
connection string has added a lot of options which are not required
now:
v15-0001
postgres=# select subname, subconninfo from pg_subscription;
            subname            |               subconninfo
-------------------------------+------------------------------------------
pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
(1 row)



v15-0001+0002+0003
postgres=# select subname, subconninfo from pg_subscription;
            subname            |

      subconninfo



-------------------------------+------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
------------------------------
pg_createsubscriber_5_1895366 | user=shubham
passfile='/home/shubham/.pgpass' channel_binding=prefer ho
st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_versi
on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_
hosts=disable dbname=postgres
(1 row)


Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
gssencmode, krbsrvname, etc are getting included. This does not look
intentional, we should keep the subscription connection same as in
v15-0001.

Thanks and Regards,
Shubham Khanna.


Thanks and Regards,
Shubham Khanna.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

06 February 2024, 08:27:55

Dear Euler,

> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
> 
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"

I found that current code (HEAD+v14-0001) leads stuck or timeout because the recovery
would be never finished. In attached script emulates the case standby.signal file is missing.
This script always fails with below output:

```
...
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster was stopped
pg_createsubscriber: error: recovery timed out
...
```

I also attached the server log during pg_createsubscriber, and we can see that
walreceiver exits before receiving all the required changes. I cannot find a path
yet, but the inconsistency lead the situation which walreceiver exists but startup
remains. This also said that recovery status must be checked in check_subscriber().

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

06 February 2024, 08:44:44

Dear Shubham,

Thanks for testing our codes!

> While reviewing the v15 patches I discovered that subscription
> connection string has added a lot of options which are not required
> now:
> v15-0001
> postgres=# select subname, subconninfo from pg_subscription;
>             subname            |               subconninfo
> -------------------------------+------------------------------------------
> pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
> (1 row)
> 
> 
> 
> v15-0001+0002+0003
> postgres=# select subname, subconninfo from pg_subscription;
>             subname            |
> 
>       subconninfo
> 
> 
> 
> -------------------------------+--------------------------------------------------
> ----------------------
> ----------------------------------------------------------------------------------
> ----------------------
> ----------------------------------------------------------------------------------
> ----------------------
> ------------------------------
> pg_createsubscriber_5_1895366 | user=shubham
> passfile='/home/shubham/.pgpass' channel_binding=prefer ho
> st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
> sslcertmode=allow sslsni=1 ssl_min_protocol_versi
> on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
> target_session_attrs=any load_balance_
> hosts=disable dbname=postgres
> (1 row)
> 
> 
> Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
> gssencmode, krbsrvname, etc are getting included. This does not look
> intentional, we should keep the subscription connection same as in
> v15-0001.

You should attach the script the reproducer. I suspected you used pg_basebackup
-R command for setting up the standby. In this case, it is intentional.
These settings are not caused by the pg_createsubscriber, done by pg_basebackup.
If you set primary_conninfo manually like attached, these settings would not appear.

As the first place, listed options (E.g., passfile, channel_binding, sslmode,
sslcompression, sslcertmode, etc...) were set when you connect to the database
via libpq functions. PQconninfoOptions in fe-connect.c lists parameters and
their default value.

v15-0003 reuses the primary_conninfo for subconninfo attribute. primary_conninfo
is set by pg_basebackup specified with '-R' option.
The content is built in GenerateRecoveryConfig(), which bypass parameters from
PQconninfo(). This function returns all the libpq connection parameters even if
it is set as default. So primary_conninfo looks longer.

I don't think this works wrongly. Users still can set an arbitrary connection
string as primary_conninfo. You just use longer string unintentionally.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

shorter_subconninfo.sh

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

06 February 2024, 10:26:21

Hi,

> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>           Connections to standby is not changed because the standby/subscriber
>           does not require such type of connection, in principle.
>           If we can accept connecting to subscriber with replication mode,
>           this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>           I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com
>

I have created a topup patch 0007 on top of v15-0006.

I revived the patch which removes -S option and adds some options
instead. The patch add option for --port, --username and --socketdir.
This patch also ensures that anyone cannot connect to the standby
during the pg_createsubscriber, by setting listen_addresses,
unix_socket_permissions, and unix_socket_directories.

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

07 February 2024, 04:53:37

On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:

Thanks for updating the patch!

Thanks for taking a look.

>
I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.
>

Right, your refactoring looks fewer stack. So I pause to revise my refactoring
patch.

I didn't complete this task yet so I didn't include it in this patch.

>
The documentation update is almost there. I will include the modifications in
the next patch.
>

OK. I think it should be modified before native speakers will attend to the
thread.

Same for this one.

>
Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.
>

I didn't have such a point of view. However, it is not related whether -P exists
or not. Even v14-0001 requires primary to accept both normal/replication connections.
If we want to avoid it, the connection from pg_createsubscriber can be restored
to replication-connection.
(I felt we do not have to use replication protocol even if we change the connection mode)

That's correct. We made a decision to use non physical replication connections

(besides the one used to connect primary <-> standby). Hence, my concern about

a HBA rule falls apart. I'm not convinced that using a modified

primary_conninfo is the only/right answer to fill the subscription connection

string. Physical vs logical replication has different requirements when we talk

about users. The physical replication requires only the REPLICATION privilege.

On the other hand, to create a subscription you must have the privileges of

pg_create_subscription role and also CREATE privilege on the specified

database. Unless, you are always recommending the superuser for this tool, I'm

afraid there will be cases that reusing primary_conninfo will be an issue. The

more I think about this UI, the more I think that, if it is not hundreds of

lines of code, it uses the primary_conninfo the -P is not specified.

The motivation why -P is not needed is to ensure the consistency of nodes.
pg_createsubscriber assumes that the -P option can connect to the upstream node,
but no one checks it. Parsing two connection strings may be a solution but be
confusing. E.g., what if some options are different?
I think using a same parameter is a simplest solution.

Ugh. An error will occur the first time (get_primary_sysid) it tries to connect

to primary.

I found that no one refers the name of temporary slot. Can we remove the variable?

It is gone. I did a refactor in the create_logical_replication_slot function.

Slot name is assigned internally (no need for slot_name or temp_replslot) and

temporary parameter is included.

Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
All values are set to 0x0.

It is. However, I keep the assignments for 2 reasons: (a) there might be

parameters whose default value is not zero, (b) the standard does not say that

a null pointer must be represented by zero and (c) there is no harm in being

paranoid during initial assignment.

You said "target server must be a standby" in [1], but I cannot find checks for it.
IIUC, there are two approaches:
a) check the existence "standby.signal" in the data directory
b) call an SQL function "pg_is_in_recovery"

I applied v16-0004 that implements option (b).

I still think they can be combined as "bindir".

I applied a patch that has a single variable for BINDIR.

WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
sure it is good. These settings would remain on new subscriber even after the
pg_createsubscriber. Can we avoid it? I come up with passing these parameters
via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
(all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.

Functions arguments should not be struct because they are passing by value.
They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
we can pass a value which would be really used.

Done.

07.
```
static char *get_base_conninfo(char *conninfo, char *dbname,
const char *noderole);
```

Not sure noderole should be passed here. It is used only for the logging.
Can we output string before calling the function?
(The parameter is not needed anymore if -P is removed)

Done.

08.
The terminology is still not consistent. Some functions call the target as standby,
but others call it as subscriber.

The terminology should reflect the actual server role. I'm calling it "standby"

if it is a physical replica and "subscriber" if it is a logical replica. Maybe

"standby" isn't clear enough.

09.
v14 does not work if the standby server has already been set recovery_target*
options. PSA the reproducer. I considered two approaches:

a) raise an ERROR when these parameter were set. check_subscriber() can do it
b) overwrite these GUCs as empty strings.

I prefer (b) that's exactly what you provided in v16-0006.

10.
The execution always fails if users execute --dry-run just before. Because
pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
use-case, so current implementation seems not user-friendly. How should we fix?
Below bullets are my idea:

a) avoid stopping the standby in case of dry_run: seems possible.
b) accept even if the standby is stopped: seems possible.
c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?

I prefer (a). I applied a slightly modified version of v16-0005.

This new patch contains the following changes:

* check whether the target is really a standby server (0004)

* refactor: pg_create_logical_replication_slot function

* use a single variable for pg_ctl and pg_resetwal directory

* avoid recovery errors applying default settings for some GUCs (0006)

* don't stop/start the standby in dry run mode (0005)

* miscellaneous fixes

I don't understand why v16-0002 is required. In a previous version, this patch

was using connections in logical replication mode. After some discussion we

decided to change it to regular connections and use SQL functions (instead of

replication commands). Is it a requirement for v16-0003?

I started reviewing v16-0007 but didn't finish yet. The general idea is ok.

However, I'm still worried about preventing some use cases if it provides only

the local connection option. What if you want to keep monitoring this instance

while the transformation is happening? Let's say it has a backlog that will

take some time to apply. Unless, you have a local agent, you have no data about

this server until pg_createsubscriber terminates. Even the local agent might

not connect to the server unless you use the current port.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v17-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

07 February 2024, 05:10:55

Dear Shlok,

Thanks for updating the patch!

> I have created a topup patch 0007 on top of v15-0006.
> 
> I revived the patch which removes -S option and adds some options
> instead. The patch add option for --port, --username and --socketdir.
> This patch also ensures that anyone cannot connect to the standby
> during the pg_createsubscriber, by setting listen_addresses,
> unix_socket_permissions, and unix_socket_directories.

IIUC, there are two reasons why removing -S may be good:

* force users to specify a local-connection, and
* avoid connection establishment on standby during the pg_createsubscriber.

First bullet is still valid, but we should describe that like pg_upgrade: 

>
pg_upgrade will connect to the old and new servers several times, so you might
want to set authentication to peer in pg_hba.conf or use a ~/.pgpass file
(see Section 33.16).
>

Regarding the second bullet, this patch cannot ensure it. pg_createsubscriber
just accepts port number which has been already accepted by the standby, it does
not change the port number. So any local applications on the standby server can
connect to the server and may change primary_conninfo, primary_slot_name, data, etc.
So...what should be? How do other think?

Beside, 0007 does not follow the recent changes on 0001. E.g., options should be
record in CreateSubscriberOptions. Also, should we check the privilege of socket
directory?

[1]:
https://www.postgresql.org/message-id/TY3PR01MB988902B992A4F2E99E1385EDF56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

07 February 2024, 05:31:09

Dear Euler,

Sorry for posting e-mail each other. I will read your patch
but I want to reply one of yours.

>
I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
However, I'm still worried about preventing some use cases if it provides only
the local connection option. What if you want to keep monitoring this instance
while the transformation is happening? Let's say it has a backlog that will
take some time to apply. Unless, you have a local agent, you have no data about
this server until pg_createsubscriber terminates. Even the local agent might
not connect to the server unless you use the current port.
>

(IIUC, 0007 could not overwrite a port number - refactoring is needed)

Ah, actually I did not have such a point of view. Assuming that changed port number
can avoid connection establishments, there are four options:
a) Does not overwrite port and listen_addresses. This allows us to monitor by
   external agents, but someone can modify GUCs and data during operations.
b) Overwrite port but do not do listen_addresses. Not sure it is useful... 
c) Overwrite listen_addresses but do not do port. This allows us to monitor by
   local agents, and we can partially protect the database. But there is still a 
   room.
d) Overwrite both port and listen_addresses. This can protect databases perfectly
but no one can monitor.

Hmm, which one should be chosen? I prefer c) or d).
Do you know how pglogical_create_subscriber does?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

07 February 2024, 09:17:28

Dear Euler,

I have not finished reviewing, but I can reply to you first.

>
I didn't complete this task yet so I didn't include it in this patch.
>

Just to confirm - No need to forcibly include my refactoring patch: you can
modify based on your idea.

>
That's correct. We made a decision to use non physical replication connections
(besides the one used to connect primary <-> standby). Hence, my concern about
a HBA rule falls apart. I'm not convinced that using a modified
primary_conninfo is the only/right answer to fill the subscription connection
string. Physical vs logical replication has different requirements when we talk
about users. The physical replication requires only the REPLICATION privilege.
On the other hand, to create a subscription you must have the privileges of
pg_create_subscription role and also CREATE privilege on the specified
database. Unless, you are always recommending the superuser for this tool, I'm
afraid there will be cases that reusing primary_conninfo will be an issue. The
more I think about this UI, the more I think that, if it is not hundreds of
lines of code, it uses the primary_conninfo the -P is not specified.
>

Valid point. It is one of the best practice that users set minimal privileges
for each accounts. However...

>
Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
to primary.
>

I'm not sure it is correct. I think there are several patterns.

a) -P option specified a ghost server, i.e., pg_createsubscriber cannot connect to
   anything. In this case, get_primary_sysid() would be failed because
   connect_database() would be failed.

b) -P option specified an existing server, but it is not my upstream.
   get_primary_sysid() would be suceeded.
   In this case, there are two furher branches:
   
   b-1) nodes have different system_identifier. pg_createsubscriber would raise
        an ERROR and stop.
   b-2) nodes have the same system_identifier (e.g., nodes were generated by the
        same master). In this case, current checkings would be passed but
        pg_createsubscriber would stuck or reach timeout. PSA the reproducer.

Can pg_createsubscriber check the relation two nodes have been connected by
replication protocol? Or, can we assume that all the node surely have different
system_identifier?

>
It is. However, I keep the assignments for 2 reasons: (a) there might be
parameters whose default value is not zero, (b) the standard does not say that
a null pointer must be represented by zero and (c) there is no harm in being
paranoid during initial assignment.
>

Hmn, so, no need to use `= {0};`, but up to you. Also, according to C99 standard[1],
NULL seemed to be defined as 0x0:
```
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.
If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
```

>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.
>

Sorry for confusing, it is not a solution. This approach writes parameter
settings to postgresql.auto.conf, and they are never removed. Overwritten
settings would remain.

>
I don't understand why v16-0002 is required. In a previous version, this patch
was using connections in logical replication mode. After some discussion we
decided to change it to regular connections and use SQL functions (instead of
replication commands). Is it a requirement for v16-0003?
>

No, it is not required by 0003. I forgot the decision that we force DBAs to open
normal connection establishments for pg_createsubscriber. Please ignore...

[1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf


Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

test_0207.sh

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

07 February 2024, 20:54:29

On Wed, Feb 7, 2024, at 2:31 AM, Hayato Kuroda (Fujitsu) wrote:

Ah, actually I did not have such a point of view. Assuming that changed port number
can avoid connection establishments, there are four options:
a) Does not overwrite port and listen_addresses. This allows us to monitor by
   external agents, but someone can modify GUCs and data during operations.
b) Overwrite port but do not do listen_addresses. Not sure it is useful...
c) Overwrite listen_addresses but do not do port. This allows us to monitor by
   local agents, and we can partially protect the database. But there is still a
   room.
d) Overwrite both port and listen_addresses. This can protect databases perfectly
but no one can monitor.

Remember the target server was a standby (read only access). I don't expect an

application trying to modify it; unless it is a buggy application. Regarding

GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the

server is restarted). The ones that are not PGC_POSTMASTER, does not affect the

pg_createsubscriber execution [1].

postgres=# select name, setting, context from pg_settings where name in ('max_replication_slots', 'max_logical_replication_workers', 'max_worker_processes', 'max_sync_workers_per_subscription', 'max_parallel_apply_workers_per_subscription');

name | setting | context

---------------------------------------------+---------+------------

max_logical_replication_workers | 4 | postmaster

max_parallel_apply_workers_per_subscription | 2 | sighup

max_replication_slots | 10 | postmaster

max_sync_workers_per_subscription | 2 | sighup

max_worker_processes | 8 | postmaster

(5 rows)

I'm just pointing out that this case is a different from pg_upgrade (from which

this idea was taken). I'm not saying that's a bad idea. I'm just arguing that

you might be preventing some access read only access (monitoring) when it is

perfectly fine to connect to the database and execute queries. As I said

before, the current UI allows anyone to setup the standby to accept only local

connections. Of course, it is an extra step but it is possible. However, once

you apply v16-0007, there is no option but use only local connection during the

transformation. Is it an acceptable limitation?

Under reflection, I don't expect a big window

1802 /*

1803 * Start subscriber and wait until accepting connections.

1804 */

1805 pg_log_info("starting the subscriber");

1806 if (!dry_run)

1807 start_standby_server(pg_bin_dir, opt.subscriber_dir, server_start_log);

1808

1809 /*

1810 * Waiting the subscriber to be promoted.

1811 */

1812 wait_for_end_recovery(dbinfo[0].subconninfo, pg_bin_dir, &opt);

1845 /*

1846 * Stop the subscriber.

1847 */

1848 pg_log_info("stopping the subscriber");

1849 if (!dry_run)

1850 stop_standby_server(pg_bin_dir, opt.subscriber_dir);

... mainly because the majority of the time will be wasted in

wait_for_end_recovery() if the server takes some time to reach consistent state

(and during this phase it cannot accept connections anyway). Aren't we worrying

too much about it?

Hmm, which one should be chosen? I prefer c) or d).
Do you know how pglogical_create_subscriber does?

pglogical_create_subscriber does nothing [2][3].

[1] https://www.postgresql.org/docs/current/logical-replication-config.html

[2] https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c#L488

[3] https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c#L529

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

08 February 2024, 03:04:05

Dear Euler,

>
Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application.
>

What if the client modifies the data just after the promotion?
Naively considered, all the changes can be accepted, but are there any issues?

>
Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].
>

IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

>
I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?
>

My remained concern is written above. If they do not problematic we may not have
to restrict them for now. At that time, changes 

1) overwriting a port number,
2) setting listen_addresses = ''

are not needed, right? IIUC inconsistency of -P may be still problematic.

>
pglogical_create_subscriber does nothing [2][3].
>

Oh, thanks.
Just to confirm - pglogical set shared_preload_libraries to '', should we follow or not?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

08 February 2024, 14:22:21

Dear Euler,

Here are my minor comments for 17.

01.
```
/* Options */
static const char *progname;

static char *primary_slot_name = NULL;
static bool dry_run = false;

static bool success = false;

static LogicalRepInfo *dbinfo;
static int    num_dbs = 0;
```

The comment seems out-of-date. There is only one option.

02. check_subscriber and check_publisher

Missing pg_catalog prefix in some lines.

03. get_base_conninfo

I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.


04.

I check the coverage and found two functions have been never called:
 - drop_subscription
 - drop_replication_slot

Also, some cases were not tested. Below bullet showed notable ones for me.
(Some of them would not be needed based on discussions)

* -r is specified
* -t is specified
* -P option contains dbname
* -d is not specified
* GUC settings are wrong
* primary_slot_name is specified on the standby
* standby server is not working

In feature level, we may able to check the server log is surely removed in case
of success.

So, which tests should be added? drop_subscription() is called only when the
cleanup phase, so it may be difficult to test. According to others, it seems that
-r and -t are not tested. GUC-settings have many test cases so not sure they
should be. Based on this, others can be tested.

PSA my top-up patch set.

V18-0001: same as your patch, v17-0001.
V18-0002: modify the alignment of codes.
V18-0003: change an argument of get_base_conninfo. Per comment 3.
=== experimental patches ===
V18-0004: Add testcases per comment 4.
V18-0005: Remove -P option. I'm not sure it should be needed, but I made just in case.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Dear Euler,

Further comments for v17.

01.
This program assumes that the target server has same major version with this.
Because the target server would be restarted by same version's pg_ctl command.
I felt it should be ensured by reading the PG_VERSION.

02.
pg_upgrade checked the version of using executables, like pg_ctl, postgres, and
pg_resetwal. I felt it should be as well.

03. get_bin_directory
```
if (find_my_exec(path, full_path) < 0)
{
pg_log_error("The program \"%s\" is needed by %s but was not found in the\n"
"same directory as \"%s\".\n",
"pg_ctl", progname, full_path);
```

s/"pg_ctl"/progname

04.
Missing canonicalize_path()?

05.
Assuming that the target server is a cascade standby, i.e., it has a role as
another primary. In this case, I thought the child node would not work. Because
pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at that
time. I have not tested, but should the program detect it and exit earlier?

06.
wait_for_end_recovery() waits forever even if the standby has been disconnected
from the primary, right? should we check the status of the replication via
pg_stat_wal_receiver?

07.
The cleanup function has couple of bugs.

* If subscriptions have been created on the database, the function also tries to
drop a publication. But it leads an ERROR because it has been already dropped.
See setup_subscriber().
* If the subscription has been created, drop_replication_slot() leads an ERROR.
Because the subscriber tried to drop the subscription while executing DROP SUBSCRIPTION.

08.
I found that all messages (ERROR, WARNING, INFO, etc...) would output to stderr,
but I felt it should be on stdout. Is there a reason? pg_dump outputs messages to
stderr, but the motivation might be to avoid confusion with dumps.

09.
I'm not sure the cleanup for subscriber is really needed. Assuming that there
are two databases, e.g., pg1 pg2 , and we fail to create a subscription on pg2.
This can happen when the subscription which has the same name has been already
created on the primary server.
In this case a subscirption pn pg1 would be removed. But what is a next step?
Since a timelineID on the standby server is larger than the primary (note that
the standby has been promoted once), we cannot resume the physical replication
as-is. IIUC the easiest method to retry is removing a cluster once and restarting
from pg_basebackup. If so, no need to cleanup the standby because it is corrupted.
We just say "Please remove the cluster and recreate again".

Here is a reproducer.

1. apply the txt patch atop 0001 patch.
2. run test_corruption.sh.
3. when you find a below output [1], connect to a testdb from another terminal and
run CREATE SUBSCRITPION for the same subscription on the primary
4. Finally, pg_createsubscriber would fail the creation.

I also attached server logs of both nodes and the output.
Note again that this is a real issue. I used a tricky way for surely overlapping name,
but this can happen randomly.

10.
While investigating #09, I found that we cannot report properly a reason why the
subscription cannot be created. The output said:

```
pg_createsubscriber: error: could not create subscription "pg_createsubscriber_16389_3884" on database "testdb": out of
memory
```

But the standby serverlog said:

```
ERROR: subscription "pg_createsubscriber_16389_3884" already exists
STATEMENT: CREATE SUBSCRIPTION pg_createsubscriber_16389_3884 CONNECTION 'user=postgres port=5431 dbname=testdb'
PUBLICATIONpg_createsubscriber_16389 WITH (create_slot = false, copy_data = false, enabled = false)

```

[1]
```
pg_createsubscriber: creating the replication slot "pg_createsubscriber_16389_3884" on database "testdb"
pg_createsubscriber: XXX: sleep 20s
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

13 February 2024, 12:55:51

Dear hackers,

Since the original author seems bit busy, I updated the patch set.

> 
> 01.
> ```
> /* Options */
> static const char *progname;
> 
> static char *primary_slot_name = NULL;
> static bool dry_run = false;
> 
> static bool success = false;
> 
> static LogicalRepInfo *dbinfo;
> static int    num_dbs = 0;
> ```
> 
> The comment seems out-of-date. There is only one option.

Changed the comment to /* Global variables */.

> 
> 02. check_subscriber and check_publisher
> 
> Missing pg_catalog prefix in some lines.

This has been already addressed in v18.

> 03. get_base_conninfo
> 
> I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.

This has been already addressed in v18.

> 04.
> 
> I check the coverage and found two functions have been never called:
>  - drop_subscription
>  - drop_replication_slot
> 
> Also, some cases were not tested. Below bullet showed notable ones for me.
> (Some of them would not be needed based on discussions)
> 
> * -r is specified
> * -t is specified
> * -P option contains dbname
> * -d is not specified
> * GUC settings are wrong
> * primary_slot_name is specified on the standby
> * standby server is not working
> 
> In feature level, we may able to check the server log is surely removed in case
> of success.
> 
> So, which tests should be added? drop_subscription() is called only when the
> cleanup phase, so it may be difficult to test. According to others, it seems that
> -r and -t are not tested. GUC-settings have many test cases so not sure they
> should be. Based on this, others can be tested.

This has been already addressed in v18.

PSA my top-up patch set.

V19-0001: same as Euler's patch, v17-0001.
V19-0002: Update docs per recent changes. Also, some adjustments were done.
V19-0003: Modify the alignment of codes. Mostly same as v18-0002.
V19-0004: Change an argument of get_base_conninfo. Same as v18-0003.
=== experimental patches ===
V19-0005: Add testcases. Same as v18-0004.
V19-0006: Update a comment above global variables.
V19-0007: Address comments from Vignesh.
V19-0008: Fix error message in get_bin_directory().
V19-0009: Remove -P option. Same as v18-0005. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

On Thu, Feb 8, 2024, at 12:04 AM, Hayato Kuroda (Fujitsu) wrote:

>
Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application.
>

What if the client modifies the data just after the promotion?
Naively considered, all the changes can be accepted, but are there any issues?

If someone modifies data after promotion, fine; she has to deal with conflicts,

if any. IMO it is solved adding one or two sentences in the documentation.

>
Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].
>

IIUC, primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.

>
I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?
>

My remained concern is written above. If they do not problematic we may not have
to restrict them for now. At that time, changes

1) overwriting a port number,
2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as

option.

>
pglogical_create_subscriber does nothing [2][3].
>

Oh, thanks.
Just to confirm - pglogical set shared_preload_libraries to '', should we follow or not?

The in-core logical replication does not require any library to be loaded.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

14 February 2024, 03:26:59

On Fri, Feb 9, 2024, at 3:27 AM, vignesh C wrote:

On Wed, 7 Feb 2024 at 10:24, Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!

Thanks for the updated patch, few comments:
Few comments:
1) Cleanup function handler flag should be reset, i.e.
dbinfo->made_replslot = false; should be there else there will be an
error during drop replication slot cleanup in error flow:

Why? drop_replication_slot() is basically called by atexit().

2) Cleanup function handler flag should be reset, i.e.
dbinfo->made_publication = false; should be there else there will be
an error during drop publication cleanup in error flow:

Ditto. drop_publication() is basically called by atexit().

3) Cleanup function handler flag should be reset, i.e.
dbinfo->made_subscription = false; should be there else there will be
an error during drop publication cleanup in error flow:

Ditto. drop_subscription() is only called by atexit().

4) I was not sure if drop_publication is required here, as we will not
create any publication in subscriber node:
+               if (dbinfo[i].made_subscription)
+               {
+                       conn = connect_database(dbinfo[i].subconninfo);
+                       if (conn != NULL)
+                       {
+                               drop_subscription(conn, &dbinfo[i]);
+                               if (dbinfo[i].made_publication)
+                                       drop_publication(conn, &dbinfo[i]);
+                               disconnect_database(conn);
+                       }
+               }

setup_subscriber() explains the reason.

* Since the publication was created before the consistent LSN, it is

* available on the subscriber when the physical replica is promoted.

* Remove publications from the subscriber because it has no use.

drop_publication(conn, &dbinfo[i]);

I changed the referred code a bit because it is not reliable. Since

made_subscription was not set until we create the subscription, the

publications that were created on primary and replicated to standby won't be

removed on subscriber. Instead, it should rely on the recovery state to decide

if it should drop it.

5) The connection should be disconnected in case of error case:
+       /* secure search_path */
+       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not clear search_path: %s",
PQresultErrorMessage(res));
+               return NULL;
+       }
+       PQclear(res);

connect_database() is usually followed by a NULL test and exit(1) if it cannot

connect. It should be added for correctness but it is not a requirement.

7) These includes are not required:
7.a) #include <signal.h>
7.b) #include <sys/stat.h>
7.c) #include <sys/wait.h>
7.d) #include "access/xlogdefs.h"
7.e) #include "catalog/pg_control.h"
7.f) #include "common/file_utils.h"
7.g) #include "utils/pidfile.h"

Good catch. I was about to review the include files.

8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
required or kept for future purpose:
+enum WaitPMResult
+{
+       POSTMASTER_READY,
+       POSTMASTER_STANDBY,
+       POSTMASTER_STILL_STARTING,
+       POSTMASTER_FAILED
+};

I just copied verbatim from pg_ctl. We should remove the superfluous states.

9) pg_createsubscriber should be kept after pg_basebackup to maintain
the consistent order:

Ok.

10) dry-run help message is not very clear, how about something
similar to pg_upgrade's message like "check clusters only, don't
change any data":

$ /tmp/pgdevel/bin/pg_archivecleanup --help | grep dry-run

-n, --dry-run dry run, show the names of the files that would be

$ /tmp/pgdevel/bin/pg_combinebackup --help | grep dry-run

-n, --dry-run don't actually do anything

$ /tmp/pgdevel/bin/pg_resetwal --help | grep dry-run

-n, --dry-run no update, just show what would be done

$ /tmp/pgdevel/bin/pg_rewind --help | grep dry-run

-n, --dry-run stop before modifying anything

$ /tmp/pgdevel/bin/pg_upgrade --help | grep check

-c, --check check clusters only, don't change any data

I used the same sentence as pg_rewind but I'm fine with pg_upgrade or

pg_combinebackup sentences.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

14 February 2024, 08:35:03

Dear Euler,

>
If someone modifies data after promotion, fine; she has to deal with conflicts,
if any. IMO it is solved adding one or two sentences in the documentation.
>

OK. I could find issues, for now.

>
> >
> Regarding
> GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
> server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
> pg_createsubscriber execution [1].
> >
> 
> IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.
>

Just to confirm - even if the primary_slot_name would be changed during
the conversion, the slot initially set would be dropped. Currently we do not
find any issues.

>
> >
> I'm just pointing out that this case is a different from pg_upgrade (from which
> this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
> you might be preventing some access read only access (monitoring) when it is
> perfectly fine to connect to the database and execute queries. As I said
> before, the current UI allows anyone to setup the standby to accept only local
> connections. Of course, it is an extra step but it is possible. However, once
> you apply v16-0007, there is no option but use only local connection during the
> transformation. Is it an acceptable limitation?
> >
> 
> My remained concern is written above. If they do not problematic we may not have
> to restrict them for now. At that time, changes 
> 
> 1) overwriting a port number,
> 2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

> are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as
option.
>


Hmm, OK. So let me summarize current status and discussions. 

Policy)

Basically, we do not prohibit to connect to primary/standby.
primary_slot_name may be changed during the conversion and
tuples may be inserted on target just after the promotion, but it seems no issues.

API)

-D (data directory) and -d (databases) are definitively needed.

Regarding the -P (a connection string for source), we can require it for now.
But note that it may cause an inconsistency if the pointed not by -P is different
from the node pointde by primary_conninfo.

As for the connection string for the target server, we can choose two ways:
a)
accept native connection string as -S. This can reuse the same parsing mechanism as -P,
but there is a room that non-local server is specified.

b)
accept username/port as -U/-p
(Since the policy is like above, listen_addresses would not be overwritten. Also, the port just specify the listening
port).
This can avoid connecting to non-local, but more options may be needed.
(E.g., Is socket directory needed? What about password?)

Other discussing point, reported issue)

Points raised by me [1] are not solved yet.

* What if the target version is PG16-?
* What if the found executables have diffent version with pg_createsubscriber?
* What if the target is sending WAL to another server?
   I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
* Can we really cleanup the standby in case of failure?
   Shouldn't we suggest to remove the target once?
* Can we move outputs to stdout?

[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

15 February 2024, 05:27:48

On Thu, Feb 15, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> Here are my minor comments for 17.
>
> 01.
> ```
> /* Options */
> static const char *progname;
>
> static char *primary_slot_name = NULL;
> static bool dry_run = false;
>
> static bool success = false;
>
> static LogicalRepInfo *dbinfo;
> static int      num_dbs = 0;
> ```
>
> The comment seems out-of-date. There is only one option.
>
> 02. check_subscriber and check_publisher
>
> Missing pg_catalog prefix in some lines.
>
> 03. get_base_conninfo
>
> I think dbname would not be set. IIUC, dbname should be a pointer of the pointer.
>
>
> 04.
>
> I check the coverage and found two functions have been never called:
>  - drop_subscription
>  - drop_replication_slot
>
> Also, some cases were not tested. Below bullet showed notable ones for me.
> (Some of them would not be needed based on discussions)
>
> * -r is specified
> * -t is specified
> * -P option contains dbname
> * -d is not specified
> * GUC settings are wrong
> * primary_slot_name is specified on the standby
> * standby server is not working
>
> In feature level, we may able to check the server log is surely removed in case
> of success.
>
> So, which tests should be added? drop_subscription() is called only when the
> cleanup phase, so it may be difficult to test. According to others, it seems that
> -r and -t are not tested. GUC-settings have many test cases so not sure they
> should be. Based on this, others can be tested.
>
> PSA my top-up patch set.
>
> V18-0001: same as your patch, v17-0001.
> V18-0002: modify the alignment of codes.
> V18-0003: change an argument of get_base_conninfo. Per comment 3.
> === experimental patches ===
> V18-0004: Add testcases per comment 4.
> V18-0005: Remove -P option. I'm not sure it should be needed, but I made just in case.

I created a cascade Physical Replication system like
node1->node2->node3 and ran pg_createsubscriber for node2. After
running the script, I started the node2 again and found
pg_createsubscriber command was successful after which the physical
replication between node2 and node3 has been broken. I feel
pg_createsubscriber should check this scenario and throw an error in
this case to avoid breaking the cascaded replication setup. I have
attached the script which was used to verify this.

Thanks and Regards,
Shubham Khanna.

Attachment

cascade_replication.sh

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

15 February 2024, 11:23:16

Dear Euler, 

> Policy)
> 
> Basically, we do not prohibit to connect to primary/standby.
> primary_slot_name may be changed during the conversion and
> tuples may be inserted on target just after the promotion, but it seems no issues.
> 
> API)
> 
> -D (data directory) and -d (databases) are definitively needed.
> 
> Regarding the -P (a connection string for source), we can require it for now.
> But note that it may cause an inconsistency if the pointed not by -P is different
> from the node pointde by primary_conninfo.
> 
> As for the connection string for the target server, we can choose two ways:
> a)
> accept native connection string as -S. This can reuse the same parsing
> mechanism as -P,
> but there is a room that non-local server is specified.
> 
> b)
> accept username/port as -U/-p
> (Since the policy is like above, listen_addresses would not be overwritten. Also,
> the port just specify the listening port).
> This can avoid connecting to non-local, but more options may be needed.
> (E.g., Is socket directory needed? What about password?)
> 
> Other discussing point, reported issue)
> 
> Points raised by me [1] are not solved yet.
> 
> * What if the target version is PG16-?
> * What if the found executables have diffent version with pg_createsubscriber?
> * What if the target is sending WAL to another server?
>    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> * Can we really cleanup the standby in case of failure?
>    Shouldn't we suggest to remove the target once?
> * Can we move outputs to stdout?

Based on the discussion, I updated the patch set. Feel free to pick them and include.
Removing -P patch was removed, but removing -S still remained.

Also, while testing the patch set, I found some issues.

1.
Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
but the inclusion was removed per comment. Added the inclusion again.

2.
As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
the last node would stuck. This is because a walreciever process requires nodes have the same
system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

3.
Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
publications on the standby node.

4.
If the standby server was initialized as PG16-, this command would fail.
Because the API of pg_logical_create_replication_slot() were changed.

5.
Also, used pg_ctl commands must have same versions with the instance.
I think we should require all the executables and servers must be a same major version.

Based on them, below part describes attached ones:

V20-0001: same as Euler's patch, v17-0001.
V20-0002: Update docs per recent changes. Same as v19-0002
V20-0003: Modify the alignment of codes. Same as v19-0003
V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
=== experimental patches ===
V20-0005: Add testcases. Same as v19-0004
V20-0006: Update a comment above global variables. Same as v19-0005
V20-0007: Address comments from Vignesh. Some parts you don't like
          are reverted.
V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
V20-0009: Remove -S option. Refactored from v16-0007
V20-0010: Add check versions of executables and the target, per above and [4]
V20-0011: Detect a disconnection while waiting the recovery, per [4]
V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.

[1]: https://cirrus-ci.com/task/4619792833839104
[2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
[4]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

16 February 2024, 03:14:38

On Thu, Feb 15, 2024, at 8:23 AM, Hayato Kuroda (Fujitsu) wrote:

> Points raised by me [1] are not solved yet.
>
> * What if the target version is PG16-?

pg_ctl and pg_resetwal won't work.

$ pg_ctl start -D /tmp/blah

waiting for server to start....

2024-02-15 23:50:03.448 -03 [364610] FATAL: database files are incompatible with server

2024-02-15 23:50:03.448 -03 [364610] DETAIL: The data directory was initialized by PostgreSQL version 16, which is not compatible with this version 17devel.

stopped waiting

pg_ctl: could not start server

Examine the log output.

$ pg_resetwal -D /tmp/blah

pg_resetwal: error: data directory is of wrong version

pg_resetwal: detail: File "PG_VERSION" contains "16", which is not compatible with this program's version "17".

> * What if the found executables have diffent version with pg_createsubscriber?

The new code take care of it.

> * What if the target is sending WAL to another server?
> I.e., there are clusters like `node1->node2-.node3`, and the target is node2.

The new code detects if the server is in recovery and aborts as you suggested.

A new option can be added to ignore the fact there are servers receiving WAL

from it.

> * Can we really cleanup the standby in case of failure?
> Shouldn't we suggest to remove the target once?

If it finishes the promotion, no. I adjusted the cleanup routine a bit to avoid

it. However, we should provide instructions to inform the user that it should

create a fresh standby and try again.

> * Can we move outputs to stdout?

Are you suggesting to use another logging framework? It is not a good idea

because each client program is already using common/logging.c.

1.
Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
but the inclusion was removed per comment. Added the inclusion again.

Ok.

2.
As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
the last node would stuck. This is because a walreciever process requires nodes have the same
system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

Hopefully it was fixed.

3.
Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
publications on the standby node.

Ditto.

4.
If the standby server was initialized as PG16-, this command would fail.
Because the API of pg_logical_create_replication_slot() were changed.

See comment above.

5.
Also, used pg_ctl commands must have same versions with the instance.
I think we should require all the executables and servers must be a same major version.

It is enforced by the new code. See find_other_exec() in get_exec_path().

Based on them, below part describes attached ones:

Thanks for another review. I'm sharing a new patch to merge a bunch of

improvements and fixes. Comments are below.

v20-0002: I did some extensive documentation changes (including some of them

related to the changes in the new patch). I will defer its update to check

v20-0002. It will be included in the next one.

v20-0003: I included most of it. There are a few things that pgindent reverted

so I didn't apply. I also didn't like some SQL commands that were broken into

multiple lines with spaces at the beginning. It seems nice in the code but it

is not in the output.

v20-0004: Nice catch. Applied.

v20-0005: Applied.

v20-0006: I prefer to remove the comment.

v20-0007: I partially applied it. I only removed the states that were not used

and propose another dry run mode message. Maybe it is clear than it was.

v20-0008: I refactored the get_bin_directory code. Under reflection, I reverted

the unified binary directory that we agreed a few days ago. The main reason is

to provide a specific error message for each program it is using. The

get_exec_path will check if the program is available in the same directory as

pg_createsubscriber and if it has the same version. An absolute path is

returned and is used by some functions.

v20-0009: to be reviewed.

v20-0010: As I said above, this code was refactored so I didn't apply this one.

v20-0011: Do we really want to interrupt the recovery if the network was

momentarily interrupted or if the OS killed walsender? Recovery is critical for

the process. I think we should do our best to be resilient and deliver all

changes required by the new subscriber. The proposal is not correct because the

query return no tuples if it is disconnected so you cannot PQgetvalue(). The

retry interval (the time that ServerLoop() will create another walreceiver) is

determined by DetermineSleepTime() and it is a maximum of 5 seconds

(SIGKILL_CHILDREN_AFTER_SECS). One idea is to retry 2 or 3 times before give up

using the pg_stat_wal_receiver query. Do you have a better plan?

v20-0012: I applied a different patch to accomplish the same thing. I included

a refactor around pg_is_in_recovery() function to be used in other 2 points.

Besides that, I changed some SQL commands to avoid having superfluous

whitespace in it. I also added a test for cascaded replication scenario. And

clean up 041 test a bit.

I didn't provide an updated documentation because I want to check v20-0002. It

is on my list to check v20-0009.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v21-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

16 February 2024, 06:41:37

Dear Euler,

Thanks for updating the patch!
Before reviewing deeply, here are replies for your comments.

>
> > Points raised by me [1] are not solved yet.
> > 
> > * What if the target version is PG16-?
pg_ctl and pg_resetwal won't work.
$ pg_ctl start -D /tmp/blah
waiting for server to start....
2024-02-15 23:50:03.448 -03 [364610] FATAL:  database files are incompatible with server
2024-02-15 23:50:03.448 -03 [364610] DETAIL:  The data directory was initialized by PostgreSQL version 16, which is not
compatiblewith this version 17devel.
 
stopped waiting
pg_ctl: could not start server
Examine the log output.

$ pg_resetwal -D /tmp/blah
pg_resetwal: error: data directory is of wrong version
pg_resetwal: detail: File "PG_VERSION" contains "16", which is not compatible with this program's version "17".

> > * What if the found executables have diffent version with pg_createsubscriber?

The new code take care of it.
>

I preferred to have a common path and test one by one, but agreed this worked well.
Let's keep it and hear opinions from others.

>
> > * What if the target is sending WAL to another server?
> >    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
The new code detects if the server is in recovery and aborts as you suggested.
A new option can be added to ignore the fact there are servers receiving WAL
from it.
>

Confirmed it can detect.

>
> > * Can we really cleanup the standby in case of failure?
> >    Shouldn't we suggest to remove the target once?

If it finishes the promotion, no. I adjusted the cleanup routine a bit to avoid
it. However, we should provide instructions to inform the user that it should
create a fresh standby and try again.
>

I think the cleanup function looks not sufficient. In v21, recovery_ended is kept
to true even after drop_publication() is done in setup_subscriber(). I think that
made_subscription is not needed, and the function should output some messages
when recovery_ended is on.
Besides, in case of pg_upgrade, they always report below messages before starting
the migration. I think this is more helpful for users.

```
    pg_log(PG_REPORT, "\n"
           "If pg_upgrade fails after this point, you must re-initdb the\n"
           "new cluster before continuing.");
```

>
> > * Can we move outputs to stdout?

Are you suggesting to use another logging framework? It is not a good idea
because each client program is already using common/logging.c.
>

Hmm, indeed. Other programs in pg_basebackup seem to use the framework.

>
v20-0011: Do we really want to interrupt the recovery if the network was
momentarily interrupted or if the OS killed walsender? Recovery is critical for
the process. I think we should do our best to be resilient and deliver all
changes required by the new subscriber.
>

It might be too strict to raise an ERROR as soon as we met a disconnection.
And at least 0011 was wrong - it should be PQgetvalue(res, 0, 1) for still_alive.

>
The proposal is not correct because the
query return no tuples if it is disconnected so you cannot PQgetvalue().
>

Sorry for misunderstanding, but you might be confused. pg_createsubcriber
sends a query to target, and we are discussing the disconnection between the
target and source instances. I think the connection which pg_createsubscriber
has is stil alive so PQgetvalue() can get a value.

(BTW, callers of server_is_in_recovery() has not considered a disconnection from
the target...)

>
The
retry interval (the time that ServerLoop() will create another walreceiver) is
determined by DetermineSleepTime() and it is a maximum of 5 seconds
(SIGKILL_CHILDREN_AFTER_SECS). One idea is to retry 2 or 3 times before give up
using the pg_stat_wal_receiver query. Do you have a better plan?
>

It's good to determine the threshold. It can define the number  of acceptable
loss of walreceiver during the loop.
I think we should retry at least the postmaster revives the walreceiver.
The checking would work once per seconds, so more than 5 (or 10) may be better.
Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

16 February 2024, 09:20:05

On Thu, Feb 15, 2024 at 4:53 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> > Policy)
> >
> > Basically, we do not prohibit to connect to primary/standby.
> > primary_slot_name may be changed during the conversion and
> > tuples may be inserted on target just after the promotion, but it seems no issues.
> >
> > API)
> >
> > -D (data directory) and -d (databases) are definitively needed.
> >
> > Regarding the -P (a connection string for source), we can require it for now.
> > But note that it may cause an inconsistency if the pointed not by -P is different
> > from the node pointde by primary_conninfo.
> >
> > As for the connection string for the target server, we can choose two ways:
> > a)
> > accept native connection string as -S. This can reuse the same parsing
> > mechanism as -P,
> > but there is a room that non-local server is specified.
> >
> > b)
> > accept username/port as -U/-p
> > (Since the policy is like above, listen_addresses would not be overwritten. Also,
> > the port just specify the listening port).
> > This can avoid connecting to non-local, but more options may be needed.
> > (E.g., Is socket directory needed? What about password?)
> >
> > Other discussing point, reported issue)
> >
> > Points raised by me [1] are not solved yet.
> >
> > * What if the target version is PG16-?
> > * What if the found executables have diffent version with pg_createsubscriber?
> > * What if the target is sending WAL to another server?
> >    I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> > * Can we really cleanup the standby in case of failure?
> >    Shouldn't we suggest to remove the target once?
> > * Can we move outputs to stdout?
>
> Based on the discussion, I updated the patch set. Feel free to pick them and include.
> Removing -P patch was removed, but removing -S still remained.
>
> Also, while testing the patch set, I found some issues.
>
> 1.
> Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
> but the inclusion was removed per comment. Added the inclusion again.
>
> 2.
> As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
> the last node would stuck. This is because a walreciever process requires nodes have the same
> system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.
>
> 3.
> Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
> publications on the standby node.
>
> 4.
> If the standby server was initialized as PG16-, this command would fail.
> Because the API of pg_logical_create_replication_slot() were changed.
>
> 5.
> Also, used pg_ctl commands must have same versions with the instance.
> I think we should require all the executables and servers must be a same major version.
>
> Based on them, below part describes attached ones:
>
> V20-0001: same as Euler's patch, v17-0001.
> V20-0002: Update docs per recent changes. Same as v19-0002
> V20-0003: Modify the alignment of codes. Same as v19-0003
> V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
> === experimental patches ===
> V20-0005: Add testcases. Same as v19-0004
> V20-0006: Update a comment above global variables. Same as v19-0005
> V20-0007: Address comments from Vignesh. Some parts you don't like
>           are reverted.
> V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
> V20-0009: Remove -S option. Refactored from v16-0007
> V20-0010: Add check versions of executables and the target, per above and [4]
> V20-0011: Detect a disconnection while waiting the recovery, per [4]
> V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.
>
> [1]: https://cirrus-ci.com/task/4619792833839104
> [2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
> [3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
> [4]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

I found a couple of issues, while verifying the cascaded replication
with the following scenarios:
Scenario 1) Create cascade replication like node1->node2->node3
without using replication slots (attached
cascade_3node_setup_without_slots has the script for this):
Then I ran pg_createsubscriber by specifying primary as node1 and
standby as node3, this scenario runs successfully. I was not sure if
this should be supported or not?
Scenario 2) Create cascade replication like node1->node2->node3 using
replication slots (attached cascade_3node_setup_with_slots has the
script for this):
Here, slot name was used as slot1 for node1 to node2 and slot2 for
node2 to node3. Then I ran pg_createsubscriber by specifying primary
as node1 and standby as node3. In this case pg_createsubscriber fails
with the following error:
pg_createsubscriber: error: could not obtain replication slot
information: got 0 rows, expected 1 row
[Inferior 1 (process 2623483) exited with code 01]

This is failing because slot name slot2 is used between node2->node3
but pg_createsubscriber is checked for slot1, the slot which is used
for replication between node1->node2.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Dear hackers,

> Since it may be useful, I will post top-up patch on Monday, if there are no
> updating.

And here are top-up patches. Feel free to check and include.

v22-0001: Same as v21-0001.
=== rebased patches ===
v22-0002: Update docs per recent changes. Same as v20-0002.
v22-0003: Add check versions of the target. Extracted from v20-0003.
v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
          slightly changed.
=== Newbie ===
V22-0005: Addressed my comments which seems to be trivial[1].
          Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
v22-0006: Consider the scenario when commands are failed after the recovery.
          drop_subscription() is removed and some messages are added per [2].
V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
V22-0011: Update testcode. Per comments #17- [1].

I did not handle below points because I have unclear points.

a. 
This patch set cannot detect the disconnection between the target (standby) and
source (primary) during the catch up. Because the connection status must be gotten
at the same time (=in the same query) with the recovery status, but now it is now an
independed function (server_is_in_recovery()).

b.
This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
come up with solutions without removing -P...

[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077E98F930C3DE6BD304D0DF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAHv8Rj%2B5mzK9Jt%2B7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

19 February 2024, 09:47:37

Some review of the v21 patch:

- commit message

Mention pg_createsubscriber in the commit message title.  That's the 
most important thing that someone doing git log searches in the future 
will be looking for.


- doc/src/sgml/ref/allfiles.sgml

Move the new entry to alphabetical order.


- doc/src/sgml/ref/pg_createsubscriber.sgml

+  <para>
+   The <application>pg_createsubscriber</application> should be run at 
the target
+   server. The source server (known as publisher server) should accept 
logical
+   replication connections from the target server (known as subscriber 
server).
+   The target server should accept local logical replication connection.
+  </para>

"should" -> "must" ?

+ <refsect1>
+  <title>Options</title>

Sort options alphabetically.

It would be good to indicate somewhere which options are mandatory.

+ <refsect1>
+  <title>Examples</title>

I suggest including a pg_basebackup call into this example, so it's
easier for readers to get the context of how this is supposed to be
used.  You can add that pg_basebackup in this example is just an example 
and that other base backups can also be used.


- doc/src/sgml/reference.sgml

Move the new entry to alphabetical order.


- src/bin/pg_basebackup/Makefile

Move the new sections to alphabetical order.


- src/bin/pg_basebackup/meson.build

Move the new sections to alphabetical order.


- src/bin/pg_basebackup/pg_createsubscriber.c

+typedef struct CreateSubscriberOptions
+typedef struct LogicalRepInfo

I think these kinds of local-use struct don't need to be typedef'ed.
(Then you also don't need to update typdefs.list.)

+static void
+usage(void)

Sort the options alphabetically.

+static char *
+get_exec_path(const char *argv0, const char *progname)

Can this not use find_my_exec() and find_other_exec()?

+int
+main(int argc, char **argv)

Sort the options alphabetically (long_options struct, getopt_long()
argument, switch cases).


- .../t/040_pg_createsubscriber.pl
- .../t/041_pg_createsubscriber_standby.pl

These two files could be combined into one.

+# Force it to initialize a new cluster instead of copying a
+# previously initdb'd cluster.

Explain why?

+$node_s->append_conf(
+    'postgresql.conf', qq[
+log_min_messages = debug2

Is this setting necessary for the test?

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

19 February 2024, 10:22:34

Hi,

I have reviewed the v21 patch. And found an issue.

Initially I started the standby server with a new postgresql.conf file
(not the default postgresql.conf that is present in the instance).
pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"

And I have made 'max_replication_slots = 1' in new postgresql.conf and
made  'max_replication_slots = 0' in the default postgresql.conf file.
Now when we run pg_createsubscriber on standby we get error:
pg_createsubscriber: error: could not set replication progress for the
subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
manipulate replication origin when max_replication_slots = 0
NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR:  publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: error: could not drop replication slot
"pg_createsubscriber_5_242843" on database "postgres": ERROR:
replication slot "pg_createsubscriber_5_242843" does not exist

I observed that when we run the pg_createsubscriber command, it will
stop the standby instance (the non-default postgres configuration) and
restart the standby instance which will now be started with default
postgresql.conf, where the 'max_replication_slot = 0' and
pg_createsubscriber will now fail with the error given above.
I have added the script file with which we can reproduce this issue.
Also similar issues can happen with other configurations such as port, etc.

The possible solution would be
1) allow to run pg_createsubscriber if standby is initially stopped .
I observed that pg_logical_createsubscriber also uses this approach.
2) read GUCs via SHOW command and restore them when server restarts

I would prefer the 1st solution.

Thanks and Regards,
Shlok Kyal

Attachment

debug.sh

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

19 February 2024, 23:28:28

On Mon, Feb 19, 2024, at 6:47 AM, Peter Eisentraut wrote:

Some review of the v21 patch:

Thanks for checking.

- commit message

Mention pg_createsubscriber in the commit message title. That's the
most important thing that someone doing git log searches in the future
will be looking for.

Right. Done.

- doc/src/sgml/ref/allfiles.sgml

Move the new entry to alphabetical order.

Done.

- doc/src/sgml/ref/pg_createsubscriber.sgml

+ <para>
+   The <application>pg_createsubscriber</application> should be run at
the target
+   server. The source server (known as publisher server) should accept
logical
+   replication connections from the target server (known as subscriber
server).
+   The target server should accept local logical replication connection.
+ </para>

"should" -> "must" ?

Done.

+ <refsect1>
+ <title>Options</title>

Sort options alphabetically.

Done.

It would be good to indicate somewhere which options are mandatory.

I'll add this information in the option description. AFAICT the synopsis kind

of indicates it.

+ <refsect1>
+ <title>Examples</title>

I suggest including a pg_basebackup call into this example, so it's
easier for readers to get the context of how this is supposed to be
used. You can add that pg_basebackup in this example is just an example
and that other base backups can also be used.

We can certainly add it but creating a standby isn't out of scope here? I will

make sure to include references to pg_basebackup and the "Setting up a Standby

Server" section.

- doc/src/sgml/reference.sgml

Move the new entry to alphabetical order.

Done.

- src/bin/pg_basebackup/Makefile

Move the new sections to alphabetical order.

Done.

- src/bin/pg_basebackup/meson.build

Move the new sections to alphabetical order.

Done.

- src/bin/pg_basebackup/pg_createsubscriber.c

+typedef struct CreateSubscriberOptions
+typedef struct LogicalRepInfo

I think these kinds of local-use struct don't need to be typedef'ed.
(Then you also don't need to update typdefs.list.)

Done.

+static void
+usage(void)

Sort the options alphabetically.

Are you referring to s/options/functions/?

+static char *
+get_exec_path(const char *argv0, const char *progname)

Can this not use find_my_exec() and find_other_exec()?

It is indeed using it. I created this function because it needs to run the same

code path twice (pg_ctl and pg_resetwal).

+int
+main(int argc, char **argv)

Sort the options alphabetically (long_options struct, getopt_long()
argument, switch cases).

Done.

- .../t/040_pg_createsubscriber.pl
- .../t/041_pg_createsubscriber_standby.pl

These two files could be combined into one.

Done.

+# Force it to initialize a new cluster instead of copying a
+# previously initdb'd cluster.

Explain why?

Ok. It needs a new cluster because it will have a different system identifier

so we can make sure the target cluster is a copy of the source server.

+$node_s->append_conf(
+ 'postgresql.conf', qq[
+log_min_messages = debug2

Is this setting necessary for the test?

No. It is here as a debugging aid. Better to include it in a separate patch.

There are a few messages that I don't intend to include in the final patch.

All of these modifications will be included in the next patch. I'm finishing to

integrate patches proposed by Hayato [1] and some additional fixes and

refactors.

[1] https://www.postgresql.org/message-id/TYCPR01MB12077A8421685E5515DE408EEF5512%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

20 February 2024, 01:28:17

On Mon, Feb 19, 2024, at 7:22 AM, Shlok Kyal wrote:

I have reviewed the v21 patch. And found an issue.

Initially I started the standby server with a new postgresql.conf file
(not the default postgresql.conf that is present in the instance).
pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"

And I have made 'max_replication_slots = 1' in new postgresql.conf and
made 'max_replication_slots = 0' in the default postgresql.conf file.
Now when we run pg_createsubscriber on standby we get error:
pg_createsubscriber: error: could not set replication progress for the
subscription "pg_createsubscriber_5_242843": ERROR: cannot query or
manipulate replication origin when max_replication_slots = 0

That's by design. See [1]. The max_replication_slots parameter is used as the

maximum number of subscriptions on the server.

NOTICE: dropped replication slot "pg_createsubscriber_5_242843" on publisher
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR: publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: error: could not drop replication slot
"pg_createsubscriber_5_242843" on database "postgres": ERROR:
replication slot "pg_createsubscriber_5_242843" does not exist

That's a bug and should be fixed.

[1] https://www.postgresql.org/docs/current/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SUBSCRIBER

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

20 February 2024, 04:19:18

Hi,

On Tue, 20 Feb 2024 at 06:59, Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Feb 19, 2024, at 7:22 AM, Shlok Kyal wrote:
>
> I have reviewed the v21 patch. And found an issue.
>
> Initially I started the standby server with a new postgresql.conf file
> (not the default postgresql.conf that is present in the instance).
> pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"
>
> And I have made 'max_replication_slots = 1' in new postgresql.conf and
> made  'max_replication_slots = 0' in the default postgresql.conf file.
> Now when we run pg_createsubscriber on standby we get error:
> pg_createsubscriber: error: could not set replication progress for the
> subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
> manipulate replication origin when max_replication_slots = 0
>
>
> That's by design. See [1]. The max_replication_slots parameter is used as the
> maximum number of subscriptions on the server.
>
> NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: error: could not drop replication slot
> "pg_createsubscriber_5_242843" on database "postgres": ERROR:
> replication slot "pg_createsubscriber_5_242843" does not exist
>
>
> That's a bug and should be fixed.
>
> [1] https://www.postgresql.org/docs/current/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SUBSCRIBER
>
I think you misunderstood the issue, I reported.

My main concern is that the standby server is using different
'postgresql.conf' file (the default file) after :
+   /* Start subscriber and wait until accepting connections */
+   pg_log_info("starting the subscriber");
+   if (!dry_run)
+       start_standby_server(pg_ctl_path, opt.subscriber_dir, server_start_log);

But we initially started the standby server (before running the
pg_createsubscriber) with a new postgresql.conf file (different from
the default file. for this example created inside the 'new_path'
folder).
pg_ctl -D ../standby -l standby.log  start -o "-c
config_file=../new_path/postgresql.conf"

So, when we run the pg_createsubscriber, all the initial checks will
be run on the standby server started using the new postgresql.conf
file. But during pg_createsubscriber, it will restart the standby
server using the default postgresql.conf file. And this might create
some issues.

Thanks and Regards,
Shlok Kyal

Re: speed up a logical replica setup

From

vignesh C

Date:

20 February 2024, 09:44:14

On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...
>

Few comments for v22-0001 patch:
1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
+       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+       {
+               pg_log_error("permission denied for database %s",
dbinfo[0].dbname);
+               return false;
+       }
+       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+       {
+               pg_log_error("permission denied for function \"%s\"",
+
"pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
+               return false;
+       }

2) pg_createsubscriber fails if a table is parallely created in the
primary node:
2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
accept connections
2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
does not exist
2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
enabled = false)

If we are not planning to fix this, at least it should be documented

3) Error conditions is verbose mode has invalid error message like
"out of memory" messages like in below:
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster reached the consistent state
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_278343" on database "postgres"
pg_createsubscriber: error: could not create subscription
"pg_createsubscriber_5_278343" on database "postgres": out of memory

4) In error cases we try to drop this publication again resulting in error:
+               /*
+                * Since the publication was created before the
consistent LSN, it is
+                * available on the subscriber when the physical
replica is promoted.
+                * Remove publications from the subscriber because it
has no use.
+                */
+               drop_publication(conn, &dbinfo[i]);

Which throws these errors(because of drop publication multiple times):
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR:  publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: dropping the replication slot
"pg_createsubscriber_5_278343" on database "postgres"

5) In error cases, wait_for_end_recovery waits even though it has
identified that the replication between primary and standby is
stopped:
+/*
+ * Is recovery still in progress?
+ * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
+ * while executing the query, it returns -1.
+ */
+static int
+server_is_in_recovery(PGconn *conn)
+{
+       PGresult   *res;
+       int                     ret;
+
+       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
+
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               PQclear(res);
+               pg_log_error("could not obtain recovery progress");
+               return -1;
+       }
+

You can simulate this by stopping the primary just before
wait_for_end_recovery and you will see these error messages, but
pg_createsubscriber will continue to wait:
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
...

Regards,
Vignesh

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

20 February 2024, 10:17:18

Dear Vignesh,

Thanks for giving comments!

> Few comments for v22-0001 patch:
> 1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
> be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
> +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> +       {
> +               pg_log_error("permission denied for database %s",
> dbinfo[0].dbname);
> +               return false;
> +       }
> +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> +       {
> +               pg_log_error("permission denied for function \"%s\"",
> +
> "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
> +               return false;
> +       }

I have already pointed out as comment #8 [1] and fixed in v22-0005.

> 2) pg_createsubscriber fails if a table is parallely created in the
> primary node:
> 2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
> accept connections
> 2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
> does not exist
> 2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
> pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
> pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
> enabled = false)
> 
> If we are not planning to fix this, at least it should be documented

The error will be occurred when tables are created after the promotion, right?
I think it cannot be fixed until DDL logical replication would be implemented.
So, +1 to add descriptions.

> 3) Error conditions is verbose mode has invalid error message like
> "out of memory" messages like in below:
> pg_createsubscriber: waiting the postmaster to reach the consistent state
> pg_createsubscriber: postmaster reached the consistent state
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: creating subscription
> "pg_createsubscriber_5_278343" on database "postgres"
> pg_createsubscriber: error: could not create subscription
> "pg_createsubscriber_5_278343" on database "postgres": out of memory

Because some places use PQerrorMessage() wrongly. It should be
PQresultErrorMessage(). Fixed in v22-0005.

> 4) In error cases we try to drop this publication again resulting in error:
> +               /*
> +                * Since the publication was created before the
> consistent LSN, it is
> +                * available on the subscriber when the physical
> replica is promoted.
> +                * Remove publications from the subscriber because it
> has no use.
> +                */
> +               drop_publication(conn, &dbinfo[i]);
> 
> Which throws these errors(because of drop publication multiple times):
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: dropping the replication slot
> "pg_createsubscriber_5_278343" on database "postgres"

Right. One approach is to use DROP PUBLICATION IF EXISTS statement.
Thought?

> 5) In error cases, wait_for_end_recovery waits even though it has
> identified that the replication between primary and standby is
> stopped:
> +/*
> + * Is recovery still in progress?
> + * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
> + * while executing the query, it returns -1.
> + */
> +static int
> +server_is_in_recovery(PGconn *conn)
> +{
> +       PGresult   *res;
> +       int                     ret;
> +
> +       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
> +
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               PQclear(res);
> +               pg_log_error("could not obtain recovery progress");
> +               return -1;
> +       }
> +
> 
> You can simulate this by stopping the primary just before
> wait_for_end_recovery and you will see these error messages, but
> pg_createsubscriber will continue to wait:
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress

Yeah, v22-0001 cannot detect the disconnection from primary and standby.
V22-0007 can detect the standby crash, but v22 set could not detect the 
primary crash. Euler came up with an approach [2] for it but not implemented yet.


[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/2231a04b-f2d4-4a4e-b5cd-56be8b002427%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: speed up a logical replica setup

From

vignesh C

Date:

20 February 2024, 10:33:34

On Tue, 20 Feb 2024 at 15:47, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for giving comments!
>
> > Few comments for v22-0001 patch:
> > 1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
> > be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
> > +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> > +       {
> > +               pg_log_error("permission denied for database %s",
> > dbinfo[0].dbname);
> > +               return false;
> > +       }
> > +       if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> > +       {
> > +               pg_log_error("permission denied for function \"%s\"",
> > +
> > "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
> > +               return false;
> > +       }
>
> I have already pointed out as comment #8 [1] and fixed in v22-0005.
>
> > 2) pg_createsubscriber fails if a table is parallely created in the
> > primary node:
> > 2024-02-20 14:38:49.005 IST [277261] LOG:  database system is ready to
> > accept connections
> > 2024-02-20 14:38:54.346 IST [277270] ERROR:  relation "public.tbl5"
> > does not exist
> > 2024-02-20 14:38:54.346 IST [277270] STATEMENT:  CREATE SUBSCRIPTION
> > pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
> > pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
> > enabled = false)
> >
> > If we are not planning to fix this, at least it should be documented
>
> The error will be occurred when tables are created after the promotion, right?
> I think it cannot be fixed until DDL logical replication would be implemented.
> So, +1 to add descriptions.
>
> > 3) Error conditions is verbose mode has invalid error message like
> > "out of memory" messages like in below:
> > pg_createsubscriber: waiting the postmaster to reach the consistent state
> > pg_createsubscriber: postmaster reached the consistent state
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: creating subscription
> > "pg_createsubscriber_5_278343" on database "postgres"
> > pg_createsubscriber: error: could not create subscription
> > "pg_createsubscriber_5_278343" on database "postgres": out of memory
>
> Because some places use PQerrorMessage() wrongly. It should be
> PQresultErrorMessage(). Fixed in v22-0005.
>
> > 4) In error cases we try to drop this publication again resulting in error:
> > +               /*
> > +                * Since the publication was created before the
> > consistent LSN, it is
> > +                * available on the subscriber when the physical
> > replica is promoted.
> > +                * Remove publications from the subscriber because it
> > has no use.
> > +                */
> > +               drop_publication(conn, &dbinfo[i]);
> >
> > Which throws these errors(because of drop publication multiple times):
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: error: could not drop publication
> > "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> > "pg_createsubscriber_5" does not exist
> > pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> > database "postgres"
> > pg_createsubscriber: dropping the replication slot
> > "pg_createsubscriber_5_278343" on database "postgres"
>
> Right. One approach is to use DROP PUBLICATION IF EXISTS statement.
> Thought?

Another way would be to set made_publication to false in
drop_publication once the publication is dropped. This way after the
publication is dropped it will not try to drop the publication again
in cleanup_objects_atexit as the made_publication will be false now.

Regards,
Vignesh

Re: speed up a logical replica setup

From

Alvaro Herrera

Date:

20 February 2024, 10:53:20

On 2024-Feb-16, Hayato Kuroda (Fujitsu) wrote:

> 15.
> 
> You said in case of failure, cleanups is not needed if the process exits soon [1].
> But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?

Hmm, but doesn't this mean that the server will log an ugly message that
"client closed connection unexpectedly"?  I think it's nicer to close
the connection before terminating the process (especially since the
code for that is already written).

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
 of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)

Re: speed up a logical replica setup

From

vignesh C

Date:

20 February 2024, 11:32:20

On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...

Few comments regarding the documentation:
1) max_replication_slots information seems to be present couple of times:

+    <para>
+     The target instance must have
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     and <link
linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
+     configured to a value greater than or equal to the number of target
+     databases.
+    </para>

+   <listitem>
+    <para>
+     The target instance must have
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     configured to a value greater than or equal to the number of target
+     databases and replication slots.
+    </para>
+   </listitem>

2) Can we add an id to prerequisites and use it instead of referring
to r1-app-pg_createsubscriber-1:
-     <application>pg_createsubscriber</application> checks if the
given target data
-     directory has the same system identifier than the source data directory.
-     Since it uses the recovery process as one of the steps, it starts the
-     target server as a replica from the source server. If the system
-     identifier is not the same,
<application>pg_createsubscriber</application> will
-     terminate with an error.
+     Checks the target can be converted.  In particular, things listed in
+     <link linkend="r1-app-pg_createsubscriber-1">above section</link> would be
+     checked.  If these are not met
<application>pg_createsubscriber</application>
+     will terminate with an error.
     </para>

3) The code also checks the following:
 Verify if a PostgreSQL binary (progname) is available in the same
directory as pg_createsubscriber.

But this is not present in the pre-requisites of documentation.

4) Here we mention that the target server should be stopped, but the
same is not mentioned in prerequisites:
+   Here is an example of using <application>pg_createsubscriber</application>.
+   Before running the command, please make sure target server is stopped.
+<screen>
+<prompt>$</prompt> <userinput>pg_ctl -D /usr/local/pgsql/data stop</userinput>
+</screen>
+

5) If there is an error during any of the pg_createsubscriber
operation like if create subscription fails, it might not be possible
to rollback to the earlier state which had physical-standby
replication. I felt we should document this and also add it to the
console message like how we do in case of pg_upgrade.

Regards,
Vignesh

Re: speed up a logical replica setup

From

vignesh C

Date:

20 February 2024, 13:34:57

On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...

Few comments:
1) The below code can lead to assertion failure if the publisher is
stopped while dropping the replication slot:
+       if (primary_slot_name != NULL)
+       {
+               conn = connect_database(dbinfo[0].pubconninfo);
+               if (conn != NULL)
+               {
+                       drop_replication_slot(conn, &dbinfo[0],
primary_slot_name);
+               }
+               else
+               {
+                       pg_log_warning("could not drop replication
slot \"%s\" on primary",
+                                                  primary_slot_name);
+                       pg_log_warning_hint("Drop this replication
slot soon to avoid retention of WAL files.");
+               }
+               disconnect_database(conn);
+       }

pg_createsubscriber: error: connection to database failed: connection
to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
directory
Is the server running locally and accepting connections on that socket?
pg_createsubscriber: warning: could not drop replication slot
"standby_1" on primary
pg_createsubscriber: hint: Drop this replication slot soon to avoid
retention of WAL files.
pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
Assertion `conn != ((void *)0)' failed.
Aborted (core dumped)

This is happening because we are calling disconnect_database in case
of connection failure case too which has the following assert:
+static void
+disconnect_database(PGconn *conn)
+{
+       Assert(conn != NULL);
+
+       PQfinish(conn);
+}

2) There is a CheckDataVersion function which does exactly this, will
we be able to use this:
+       /* Check standby server version */
+       if ((ver_fd = fopen(versionfile, "r")) == NULL)
+               pg_fatal("could not open file \"%s\" for reading: %m",
versionfile);
+
+       /* Version number has to be the first line read */
+       if (!fgets(rawline, sizeof(rawline), ver_fd))
+       {
+               if (!ferror(ver_fd))
+                       pg_fatal("unexpected empty file \"%s\"", versionfile);
+               else
+                       pg_fatal("could not read file \"%s\": %m", versionfile);
+       }
+
+       /* Strip trailing newline and carriage return */
+       (void) pg_strip_crlf(rawline);
+
+       if (strcmp(rawline, PG_MAJORVERSION) != 0)
+       {
+               pg_log_error("standby server is of wrong version");
+               pg_log_error_detail("File \"%s\" contains \"%s\",
which is not compatible with this program's version \"%s\".",
+                                                       versionfile,
rawline, PG_MAJORVERSION);
+               exit(1);
+       }
+
+       fclose(ver_fd);

3) Should this be added to typedefs.list:
+enum WaitPMResult
+{
+       POSTMASTER_READY,
+       POSTMASTER_STILL_STARTING
+};

4) pgCreateSubscriber should be mentioned after pg_controldata to keep
the ordering consistency:
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index aa94f6adf6..c5edd244ef 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -285,6 +285,7 @@
    &pgCtl;
    &pgResetwal;
    &pgRewind;
+   &pgCreateSubscriber;
    &pgtestfsync;

5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
+       if (primary_slot_name)
+       {
+               appendPQExpBuffer(str,
+                                                 "SELECT 1 FROM
pg_replication_slots "
+                                                 "WHERE active AND
slot_name = '%s'",
+                                                 primary_slot_name);

6) Here pg_settings should be pg_catalog.pg_settings:
+        * - max_worker_processes >= 1 + number of dbs to be converted
+        *------------------------------------------------------------------------
+        */
+       res = PQexec(conn,
+                                "SELECT setting FROM pg_settings
WHERE name IN ("
+                                "'max_logical_replication_workers', "
+                                "'max_replication_slots', "
+                                "'max_worker_processes', "
+                                "'primary_slot_name') "
+                                "ORDER BY name");

Regards,
Vignesh

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

21 February 2024, 08:00:49

> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
>           slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
>           Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
>           drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].

I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
available at 0/3000F10

To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch

2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+   if (max_repslots - cur_repslots < num_dbs)
+   {
+       pg_log_error("publisher requires %d replication slots, but
only %d remain",
+                    num_dbs, max_repslots - cur_repslots);
+       pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+                         cur_repslots + num_dbs);
+       return false;
+   }
Fixed this in v23-0013

v23-0001 to v23-0011 is same as v22-0001 to v22-0011

Thanks and Regards,
Shlok Kyal

Dear Vignesh,

Since no one updates, here are new patch set.
Note that comments from Peter E. was not addressed because
Euler seemed to have already fixed.

> 3) Error conditions is verbose mode has invalid error message like
> "out of memory" messages like in below:
> pg_createsubscriber: waiting the postmaster to reach the consistent state
> pg_createsubscriber: postmaster reached the consistent state
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: creating subscription
> "pg_createsubscriber_5_278343" on database "postgres"
> pg_createsubscriber: error: could not create subscription
> "pg_createsubscriber_5_278343" on database "postgres": out of memory

Descriptions were added.

> 4) In error cases we try to drop this publication again resulting in error:
> +               /*
> +                * Since the publication was created before the
> consistent LSN, it is
> +                * available on the subscriber when the physical
> replica is promoted.
> +                * Remove publications from the subscriber because it
> has no use.
> +                */
> +               drop_publication(conn, &dbinfo[i]);
> 
> Which throws these errors(because of drop publication multiple times):
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: dropping the replication slot
> "pg_createsubscriber_5_278343" on database "postgres"

Changed to ... IF EXISTS. I thought your another proposal looked not good
because if the flag was turned off, the publication on publisher node could
not be dropped.

> 5) In error cases, wait_for_end_recovery waits even though it has
> identified that the replication between primary and standby is
> stopped:
> +/*
> + * Is recovery still in progress?
> + * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
> + * while executing the query, it returns -1.
> + */
> +static int
> +server_is_in_recovery(PGconn *conn)
> +{
> +       PGresult   *res;
> +       int                     ret;
> +
> +       res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
> +
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               PQclear(res);
> +               pg_log_error("could not obtain recovery progress");
> +               return -1;
> +       }
> +
> 
> You can simulate this by stopping the primary just before
> wait_for_end_recovery and you will see these error messages, but
> pg_createsubscriber will continue to wait:
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress

Based on idea from Euler, I roughly implemented. Thought?

0001-0013 were not changed from the previous version.

V24-0014: addressed your comment in the replied e-mail.
V24-0015: Add disconnect_database() again, per [3]
V24-0016: addressed your comment in [4].
V24-0017: addressed your comment in [5].
V24-0018: addressed your comment in [6].

[1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]: https://www.postgresql.org/message-id/CALDaNm1ocVQmWhUJqxJDmR8N%3DCTbrH5GCdFU72ywnVRV6dND2A%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/202402201053.6jjvdrm7kahf%40alvherre.pgsql
[4]: https://www.postgresql.org/message-id/CALDaNm2qHuZZvh6ym6OM367RfozU7RaaRDSm%3DF8M3SNrcQG2pQ%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CALDaNm3Q5W%3DEvphDjHA1n8ii5fv2DvxVShSmQLNFgeiHsOUwPg%40mail.gmail.com
[6]: https://www.postgresql.org/message-id/CALDaNm1M73ds0GBxX-XZX56f1D%2BGPojeCCwo-DLTVnfu8DMAvw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

On Thu, Feb 22, 2024, at 9:43 AM, Hayato Kuroda (Fujitsu) wrote:

> The possible solution would be
> 1) allow to run pg_createsubscriber if standby is initially stopped .
> I observed that pg_logical_createsubscriber also uses this approach.
> 2) read GUCs via SHOW command and restore them when server restarts
>

3. add a config-file option. That's similar to what pg_rewind does. I expect

that Debian-based installations will have this issue.

I also prefer the first solution.
Another reason why the standby should be stopped is for backup purpose.
Basically, the standby instance should be saved before running pg_createsubscriber.
An easiest way is hard-copy, and the postmaster should be stopped at that time.
I felt it is better that users can run the command immediately later the copying.
Thought?

It was not a good idea if you want to keep the postgresql.conf outside PGDATA.

I mean you need extra steps that can be error prone (different settings between

files).

Shlok, I didn't read your previous email carefully. :-/

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

23 February 2024, 02:45:36

On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:

I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL: database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL: The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG: waiting for WAL to become
available at 0/3000F10

To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,

breaking a node is not good that's why you proposed v23-0012. However,

preventing pg_createsubscriber to run if there are standbys attached to it is

also annoying. If you don't access to these hosts you need to (a) kill

walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)

add a firewall rule to prevent that these hosts do not establish a connection

to the target server. I wouldn't like to include the patch as-is. IMO we need

at least one message explaining the situation to the user, I mean, add a hint

message. I'm resistant to a new option but probably a --force option is an

answer. There is no test coverage for it. I adjusted this patch (didn't include

the --force option) and add a test case.

2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+   if (max_repslots - cur_repslots < num_dbs)
+   {
+       pg_log_error("publisher requires %d replication slots, but
only %d remain",
+                    num_dbs, max_repslots - cur_repslots);
+       pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+                         cur_repslots + num_dbs);
+       return false;
+   }
Fixed this in v23-0013

Good catch!

Both are included in the next patch.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

23 February 2024, 04:44:39

On Fri, Feb 23, 2024 at 8:16 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
>
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
> available at 0/3000F10
>
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch
>
>
> IIRC we already discussed the cascading replication scenario. Of course,
> breaking a node is not good that's why you proposed v23-0012. However,
> preventing pg_createsubscriber to run if there are standbys attached to it is
> also annoying. If you don't access to these hosts you need to (a) kill
> walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
> add a firewall rule to prevent that these hosts do not establish a connection
> to the target server. I wouldn't like to include the patch as-is. IMO we need
> at least one message explaining the situation to the user, I mean, add a hint
> message.
>

Yeah, it could be a bit tricky for users to ensure that no follow-on
standby is present but I think it is still better to give an error and
prohibit running pg_createsubscriber than breaking the existing
replication. The possible solution, in this case, is to allow running
pg_basebackup via this tool or otherwise and then let the user use it
to convert to a subscriber. It would be good to keep things simple for
the first version then we can add such options like --force in
subsequent versions.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 February 2024, 07:19:04

Dear Vignesh,

I forgot to reply one of the suggestion.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +       /* Check standby server version */
> +       if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +               pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +       /* Version number has to be the first line read */
> +       if (!fgets(rawline, sizeof(rawline), ver_fd))
> +       {
> +               if (!ferror(ver_fd))
> +                       pg_fatal("unexpected empty file \"%s\"", versionfile);
> +               else
> +                       pg_fatal("could not read file \"%s\": %m", versionfile);
> +       }
> +
> +       /* Strip trailing newline and carriage return */
> +       (void) pg_strip_crlf(rawline);
> +
> +       if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +       {
> +               pg_log_error("standby server is of wrong version");
> +               pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +                                                       versionfile,
> rawline, PG_MAJORVERSION);
> +               exit(1);
> +       }
> +
> +       fclose(ver_fd);


This suggestion has been rejected because I was not sure the location of the
declaration and the implementation. Function which could be called from clients
must be declared in src/include/{common|fe_utils|utils} directory. I saw files
located at there but I could not appropriate location for CheckDataVersion.
Also, I did not think new file should be created only for this function.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 February 2024, 12:15:53

Dear Euler,

>> The possible solution would be
>> 1) allow to run pg_createsubscriber if standby is initially stopped .
>> I observed that pg_logical_createsubscriber also uses this approach.
>> 2) read GUCs via SHOW command and restore them when server restarts
>>
>3. add a config-file option. That's similar to what pg_rewind does.

Sorry, which pg_rewind option did you mention? I cannot find.
IIUC, -l is an only option which can accept the path, but it is not related with us.

Also, I'm not sure the benefit to add as new options. Basically it should be less.
Is there benefits than read via SHOW? Even if I assume the pg_resetwal has such
an option, the reason is that the target postmaster for pg_resetwal must be stopped.

>I expect
>that Debian-based installations will have this issue.

I'm not familiar with the Debian-env, so can you explain the reason?

>It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
>I mean you need extra steps that can be error prone (different settings between
>files).

Yeah, if we use my approach, users who specify such GUCs may not be happy.
So...based on above discussion, we should choose either of below items. Thought?

a)
enforce the standby must be *stopped*, and options like config_file can be specified via option.
b)
enforce the standby must be *running*,  options like config_file would be read via SHOW command.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

27 February 2024, 10:02:28

Dear Euler,

> Sorry, which pg_rewind option did you mention? I cannot find.
> IIUC, -l is an only option which can accept the path, but it is not related with us.

Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed.
The reason why --config-file exists is that pg_rewind requires that target must be stopped,
which is different from the current pg_createsubscriber. So I still do not like to add options.

[1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
[2]:
```
The target server must be shut down cleanly before running pg_rewind
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: speed up a logical replica setup

From

Andrew Dunstan

Date:

29 February 2024, 05:01:10

On 2024-02-27 Tu 05:02, Hayato Kuroda (Fujitsu) wrote:
> Dear Euler,
>
>> Sorry, which pg_rewind option did you mention? I cannot find.
>> IIUC, -l is an only option which can accept the path, but it is not related with us.
> Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed.
> The reason why --config-file exists is that pg_rewind requires that target must be stopped,
> which is different from the current pg_createsubscriber. So I still do not like to add options.
>
> [1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
> [2]:
> ```
> The target server must be shut down cleanly before running pg_rewind
> ```
>

Even though that is a difference I'd still rather we did more or less 
the same thing more or less the same way across utilities, so I agree 
with Euler's suggestion.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

01 March 2024, 20:48:20

On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:

Based on idea from Euler, I roughly implemented. Thought?

0001-0013 were not changed from the previous version.

V24-0014: addressed your comment in the replied e-mail.
V24-0015: Add disconnect_database() again, per [3]
V24-0016: addressed your comment in [4].
V24-0017: addressed your comment in [5].
V24-0018: addressed your comment in [6].

Thanks for your review. I'm attaching v25 that hopefully addresses all pending

points.

Regarding your comments [1] on v21, I included changes for almost all items

except 2, 20, 23, 24, and 25. It still think item 2 is not required because

pg_ctl will provide a suitable message. I decided not to rearrange the block of

SQL commands (item 20) mainly because it would avoid these objects on node_f.

Do we really need command_checks_all? Depending on the output it uses

additional cycles than command_ok.

In summary:

v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I

checked what you wrote and fix some gaps in what I've been written.

v24-0003: as I said I don't think we need to add it, however, I won't fight

against it if people want to add this check.

v24-0004: I spent some time on it. This patch is not completed. I cleaned it up

and include the start_standby_server code. It starts the server using the

specified socket directory, port and username, hence, preventing external client

connections during the execution.

v24-0005: partially applied

v24-0006: applied with cosmetic change

v24-0007: applied with cosmetic change

v24-0008: applied

v24-0009: applied with cosmetic change

v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is

not used when there is a database connection open. Instead, pg_log_error()

followed by disconnect_database(). In cases that it should exit immediately,

disconnect_database() has a new parameter (exit_on_error) that controls if it

needs to call exit(1). I go ahead and did the same for connect_database().

v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).

v24-0012: not applied. Under reflection, after working on v24-0004, the target

server will start with new parameters (that only accepts local connections),

hence, during the execution is not possible anymore to detect if the target

server is a primary for another server. I added a sentence for it in the

documentation (Warning section).

v24-0013: good catch. Applied.

v24-0014: partially applied. After some experiments I decided to use a small

number of attempts. The current code didn't reset the counter if the connection

is reestablished. I included the documentation suggestion. I didn't include the

IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,

I refactored the drop_publication() to not try again if the DROP PUBLICATION

failed.

v24-0015: not applied. I refactored the exit code to do the right thing: print

error message, disconnect database (if applicable) and exit.

v24-0016: not applied. But checked if the information was presented in the

documentation; it is.

v24-0017: good catch. Applied.

v24-0018: not applied.

I removed almost all boolean return and include the error logic inside the

function. It reads better. I also changed the connect|disconnect_database

functions to include the error logic inside it. There is a new parameter

on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I

think someone suggested it -- and the error message was moved to outside the

function. I improved the cleanup routine. It provides useful information if it

cannot remove the object (publication or replication slot) from the primary.

Since I applied v24-0004, I realized that extra start / stop service are

required. It mean pg_createsubscriber doesn't start the transformation with the

current standby settings. Instead, it stops the standby if it is running and

start it with the provided command-line options (socket, port,

listen_addresses). It has a few drawbacks:

* See v34-0012. It cannot detect if the target server is a primary for another

server. It is documented.

* I also removed the check for standby is running. If the standby was stopped a

long time ago, it will take some time to reach the start point.

* Dry run mode has to start / stop the service to work correctly. Is it an

issue?

However, I decided to include --retain option, I'm thinking about to remove it.

If the logging is enabled, the information during the pg_createsubscriber will

be available. The client log can be redirected to a file for future inspection.

Comments?

[1] https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v25-0001-pg_createsubscriber-creates-a-new-logical-replic.patch

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

05 March 2024, 03:48:35

On Sat, Mar 2, 2024 at 2:19 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the connection
> is reestablished. I included the documentation suggestion. I didn't include the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future inspection.
>
> Comments?

I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following error:

$ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
dbname=postgres"  -d postgres -d db1 -p 9000 -r
./pg_createsubscriber: invalid option -- 'p'
pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
information.

Here, the --p is not accepting the desired port number. Thoughts?

Thanks and Regards,
Shubham Khanna.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

05 March 2024, 05:17:14

Hi,
> I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
> I was unable to execute it and experienced the following error:
>
> $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
> dbname=postgres"  -d postgres -d db1 -p 9000 -r
> ./pg_createsubscriber: invalid option -- 'p'
> pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
> information.
>
> Here, the --p is not accepting the desired port number. Thoughts?

I investigated it and found that:

+ while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
+             long_options, &option_index)) != -1)
+ {

Here 'p', 's' and 'U' options are missing so we are getting the error.
Also, I think the 'S' option should be removed from here.

I also see that specifying long options like --subscriber-port,
--subscriber-username, --socket-directory works fine.
Thoughts?

Thanks and regards,
Shlok Kyal

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

06 March 2024, 02:05:42

On Tue, Mar 5, 2024, at 12:48 AM, Shubham Khanna wrote:

I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following error:

$ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
dbname=postgres" -d postgres -d db1 -p 9000 -r
./pg_createsubscriber: invalid option -- 'p'
pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
information.

Oops. Good catch! I will post an updated patch soon.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

06 March 2024, 10:02:06

Dear Euler,

Thanks for updating the patch!

>v24-0003: as I said I don't think we need to add it, however, I won't fight
>against it if people want to add this check.

OK, let's wait comments from senior members.

>Since I applied v24-0004, I realized that extra start / stop service are
>required. It mean pg_createsubscriber doesn't start the transformation with the
>current standby settings. Instead, it stops the standby if it is running and
>start it with the provided command-line options (socket, port,
>listen_addresses). It has a few drawbacks:
>* See v34-0012. It cannot detect if the target server is a primary for another
>  server. It is documented.

Yeah, It is a collateral damage.

>* I also removed the check for standby is running. If the standby was stopped a
>  long time ago, it will take some time to reach the start point.
>* Dry run mode has to start / stop the service to work correctly. Is it an
>  issue?

One concern (see below comment) is that -l option would not be passed even if
the standby has been logging before running pg_createsubscriber. Also, some settings
passed by pg_ctl start -o .... would not be restored.

>However, I decided to include --retain option, I'm thinking about to remove it.
>If the logging is enabled, the information during the pg_createsubscriber will
>be available. The client log can be redirected to a file for future inspection.

Just to confirm - you meant to say like below, right? 
* the client output would be redirected, and
* -r option would be removed.

Here are my initial comments for v25-0001. I read new doc and looks very good.
I may do reviewing more about v25-0001, but feel free to revise.

01. cleanup_objects_atexit
```
    PGconn       *conn;
    int            i;
```
The declaration *conn can be in the for-loop. Also, the declaration of the indicator can be in the bracket.

02. cleanup_objects_atexit
```

                /*
                 * If a connection could not be established, inform the user
                 * that some objects were left on primary and should be
                 * removed before trying again.
                 */
                if (dbinfo[i].made_publication)
                {
                    pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary",
                                   dbinfo[i].pubname, dbinfo[i].dbname);
                    pg_log_warning_hint("Consider dropping this publication before trying again.");
                }
                if (dbinfo[i].made_replslot)
                {
                    pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" on primary",
                                   dbinfo[i].subname, dbinfo[i].dbname);
                    pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files.");
                }
```

Not sure which is better, but we may able to the list to the concrete file like pg_upgrade.
(I thought it had been already discussed, but could not find from the archive. Sorry if it was a duplicated comment)

03. main
```
    while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
                            long_options, &option_index)) != -1)
```

Missing update for __shortopts.

04. main
```
            case 'D':
                opt.subscriber_dir = pg_strdup(optarg);
                canonicalize_path(opt.subscriber_dir);
                break;
...
            case 'P':
                opt.pub_conninfo_str = pg_strdup(optarg);
                break;
...
            case 's':
                opt.socket_dir = pg_strdup(optarg);
                break;
...
            case 'U':
                opt.sub_username = pg_strdup(optarg);
                break;
```

Should we consider the case these options would be specified twice?
I.e., should we call pg_free() before the substitution?
 
05. main

Missing canonicalize_path() to the socket_dir.

06. main
```
    /*
     * If socket directory is not provided, use the current directory.
     */
```

One-line comment can be used. Period can be also removed at that time.

07. main
```
    /*
     *
     * If subscriber username is not provided, check if the environment
     * variable sets it. If not, obtain the operating system name of the user
     * running it.
     */
```
Unnecessary blank.

08. main
```
        char       *errstr = NULL;
```
 
This declaration can be at else-part.

09. main.

Also, as the first place, do we have to get username if not specified?
I felt libpq can handle the case if we skip passing the info.

10. main
```
    appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s fallback_application_name=%s",
                      opt.socket_dir, opt.sub_port, opt.sub_username, progname);
    sub_base_conninfo = get_base_conninfo(sub_conninfo_str->data, NULL);
```

Is it really needed to call get_base_conninfo? I think no need to define
sub_base_conninfo.

11. main

```
    /*
     * In dry run mode, the server is restarted with the provided command-line
     * options so validation can be applied in the target server. In order to
     * preserve the initial state of the server (running), start it without
     * the command-line options.
     */
    if (dry_run)
        start_standby_server(&opt, pg_ctl_path, NULL, false);
```

I think initial state of the server may be stopped. Now both conditions are allowed.
And I think it is not good not to specify the logfile.

12. others

As Peter E pointed out [1], the main function is still huge. It has more than 400 lines.
I think all functions should have less than 100 line to keep the readability.

I considered separation idea like below. Note that this may require to change
orderings. How do you think?

* add parse_command_options() which accepts user options and verifies them
* add verification_phase() or something which checks system identifier and calls check_XXX
* add catchup_phase() or something which creates a temporary slot, writes recovery parameters,
  and wait until the end of recovery
* add cleanup_phase() or something which removes primary_slot and modifies the
  system identifier
* stop/start server can be combined into one wrapper.

Attached txt file is proofs the concept.

13. others

PQresultStatus(res) is called 17 times in this source code, it may be redundant.
I think we can introduce a function like executeQueryOrDie() and gather in one place.

14. others

I found that pg_createsubscriber does not refer functions declared in other files.
Is there a possibility to use them, e.g., streamutils.h?

15. others 

While reading the old discussions [2], Amit suggested to keep the comment and avoid
creating a temporary slot. You said "Got it" but temp slot still exists.
Is there any reason? Can you clarify your opinion?

16. others

While reading [2] and [3], I was confused the decision. You and Amit discussed
the combination with pg_createsubscriber and slot sync and how should handle
slots on the physical standby. You seemed to agree to remove such a slot, and
Amit also suggested to raise an ERROR. However, you said in [8] that such
handlings is not mandatory so should raise an WARNING in dry_run. I was quite confused.
Am I missing something?

17. others

Per discussion around [4], we might have to consider an if the some options like
data_directory and config_file was initially specified for standby server. Another
easy approach is to allow users to specify options like -o in pg_upgrade [5],
which is similar to your idea. Thought?


18. others

How do you handle the reported failure [6]?

19. main
```
    char       *pub_base_conninfo = NULL;
    char       *sub_base_conninfo = NULL;
    char       *dbname_conninfo = NULL;
```

No need to initialize pub_base_conninfo and sub_base_conninfo.
These variables would not be free'd.

20. others

IIUC, slot creations would not be finished if there are prepared transactions.
Should we detect it on the verification phase and raise an ERROR?

21. others

As I said in [7], the catch up would not be finished if long recovery_min_apply_delay
is used. Should we overwrite during the catch up?

22. pg_createsubscriber.sgml
```
    <para>
     Check
     Write recovery parameters into the target data...
```

Not sure, but "Check" seems not needed.

[1]: https://www.postgresql.org/message-id/b9aa614c-84ba-a869-582f-8d5e3ab57424%40enterprisedb.com
[2]: https://www.postgresql.org/message-id/9fd3018d-0e5f-4507-aee6-efabfb5a4440%40app.fastmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1L%2BE-bdKaOMSw-yWizcuprKMyeejyOwWjq_57%3DUqh-f%2Bg%40mail.gmail.com
[4]:
https://www.postgresql.org/message-id/TYCPR01MB12077B63D81B49E9DFD323661F55A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[5]:
https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=options%20to%20be%20passed%20directly%20to%20the%20old%20postgres%20command%3B%20multiple%20option%20invocations%20are%20appended
[6]: https://www.postgresql.org/message-id/CAHv8Rj%2B5mzK9Jt%2B7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g%40mail.gmail.com
[7]:
https://www.postgresql.org/message-id/OS3PR01MB98828B15DD9502C91E0C50D7F57D2%40OS3PR01MB9882.jpnprd01.prod.outlook.com
[8]: https://www.postgresql.org/message-id/be92c57b-82e1-4920-ac31-a8a04206db7b%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

0001-Shorten-main-function.txt

Re: speed up a logical replica setup

From

vignesh C

Date:

06 March 2024, 11:24:06

On Sat, 2 Mar 2024 at 02:19, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the connection
> is reestablished. I included the documentation suggestion. I didn't include the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future inspection.
>
> Comments?

Few comments:
1)   Can we use strdup here instead of atoi, as we do similarly in
case of pg_dump too, else we will do double conversion, convert using
atoi and again to string while forming the connection string:
+                       case 'p':
+                               if ((opt.sub_port = atoi(optarg)) <= 0)
+                                       pg_fatal("invalid subscriber
port number");
+                               break;

2) We can have some valid range for this, else we will end up in some
unexpected values when a higher number is specified:
+                       case 't':
+                               opt.recovery_timeout = atoi(optarg);
+                               break;

3) Now that we have addressed most of the items, can we handle this TODO:
+               /*
+                * TODO use primary_conninfo (if available) from subscriber and
+                * extract publisher connection string. Assume that there are
+                * identical entries for physical and logical
replication. If there is
+                * not, we would fail anyway.
+                */
+               pg_log_error("no publisher connection string specified");
+               pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+               exit(1);

4)  By default the log level as info here, I was not sure how to set
it to debug level to get these error messages:
+               pg_log_debug("publisher(%d): connection string: %s",
i, dbinfo[i].pubconninfo);
+               pg_log_debug("subscriber(%d): connection string: %s",
i, dbinfo[i].subconninfo);

5) Currently in non verbose mode there are no messages printed on
console, we could have a few of them printed irrespective of verbose
or not like the following:
a) creating publication
b) creating replication slot
c) waiting for the target server to reach the consistent state
d) If pg_createsubscriber fails after this point, you must recreate
the physical replica before continuing.
e) creating subscription

6) The message should be "waiting for the target server to reach the
consistent state":
+#define NUM_CONN_ATTEMPTS      5
+
+       pg_log_info("waiting the target server to reach the consistent state");
+
+       conn = connect_database(conninfo, true);

Regards,
Vignesh

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

07 March 2024, 04:34:38

On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:

Thanks for updating the patch!

Thanks for the feedback. I'm attaching v26 that addresses most of your comments

and some issues pointed by Vignesh [1].

>* I also removed the check for standby is running. If the standby was stopped a
> long time ago, it will take some time to reach the start point.
>* Dry run mode has to start / stop the service to work correctly. Is it an
> issue?

One concern (see below comment) is that -l option would not be passed even if
the standby has been logging before running pg_createsubscriber. Also, some settings
passed by pg_ctl start -o .... would not be restored.

That's a good point. We should state in the documentation that GUCs specified in

the command-line options are ignored during the execution.

>However, I decided to include --retain option, I'm thinking about to remove it.
>If the logging is enabled, the information during the pg_createsubscriber will
>be available. The client log can be redirected to a file for future inspection.

Just to confirm - you meant to say like below, right?
* the client output would be redirected, and
* -r option would be removed.

Yes. The logging_collector is usually enabled or the syslog is collecting the

log entries. Under reflection, another log directory to store entries for a

short period of time doesn't seem a good idea. It divides the information and

it also costs development time. The questions that make me think about it were:

Should I remove the pg_createsubscriber_output.d directory if it runs

successfully? What if there is an old file there? Is it another directory to

exclude while taking a backup? I also don't like the long directory name.

Here are my initial comments for v25-0001. I read new doc and looks very good.
I may do reviewing more about v25-0001, but feel free to revise.

01. cleanup_objects_atexit
```
PGconn *conn;
int i;
```
The declaration *conn can be in the for-loop. Also, the declaration of the indicator can be in the bracket.

Changed.

02. cleanup_objects_atexit
```

/*
* If a connection could not be established, inform the user
* that some objects were left on primary and should be
* removed before trying again.
*/
if (dbinfo[i].made_publication)
{
pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary",
dbinfo[i].pubname, dbinfo[i].dbname);
pg_log_warning_hint("Consider dropping this publication before trying again.");
}
if (dbinfo[i].made_replslot)
{
pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" on primary",
dbinfo[i].subname, dbinfo[i].dbname);
pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files.");
}
```

Not sure which is better, but we may able to the list to the concrete file like pg_upgrade.
(I thought it had been already discussed, but could not find from the archive. Sorry if it was a duplicated comment)

Do you mean the replication slot file? I think the replication slot and the

server (primary) is sufficient for checking and fixing if required.

03. main
```
while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
long_options, &option_index)) != -1)
```

Missing update for __shortopts.

Fixed.

04. main
```
case 'D':
opt.subscriber_dir = pg_strdup(optarg);
canonicalize_path(opt.subscriber_dir);
break;
...
case 'P':
opt.pub_conninfo_str = pg_strdup(optarg);
break;
...
case 's':
opt.socket_dir = pg_strdup(optarg);
break;
...
case 'U':
opt.sub_username = pg_strdup(optarg);
break;
```

Should we consider the case these options would be specified twice?
I.e., should we call pg_free() before the substitution?

It isn't a concern for the other client tools. I think the reason is that it

doesn't continue to leak memory during the execution. I wouldn't bother with it.

05. main

Missing canonicalize_path() to the socket_dir.

Fixed.

06. main
```
/*
* If socket directory is not provided, use the current directory.
*/
```

One-line comment can be used. Period can be also removed at that time.

Fixed.

07. main
```
/*
*
* If subscriber username is not provided, check if the environment
* variable sets it. If not, obtain the operating system name of the user
* running it.
*/
```
Unnecessary blank.

Fixed.

08. main
```
char *errstr = NULL;
```

This declaration can be at else-part.

Fixed.

09. main.

Also, as the first place, do we have to get username if not specified?
I felt libpq can handle the case if we skip passing the info.

Are you suggesting that the username should be optional?

10. main
```
appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s fallback_application_name=%s",
opt.socket_dir, opt.sub_port, opt.sub_username, progname);
sub_base_conninfo = get_base_conninfo(sub_conninfo_str->data, NULL);
```

Is it really needed to call get_base_conninfo? I think no need to define
sub_base_conninfo.

No. Good catch. I removed it.

11. main

```
/*
* In dry run mode, the server is restarted with the provided command-line
* options so validation can be applied in the target server. In order to
* preserve the initial state of the server (running), start it without
* the command-line options.
*/
if (dry_run)
start_standby_server(&opt, pg_ctl_path, NULL, false);
```

I think initial state of the server may be stopped. Now both conditions are allowed.
And I think it is not good not to specify the logfile.

Indeed, it is. I don't consider the following test as the first step because it

is conditional. The stop server is a precondition to start. The first step is

the start standby server so the initial state is stopped.

* If the standby server is running, stop it. Some parameters (that can

* only be set at server start) are informed by command-line options.

if (stat(pidfile, &statbuf) == 0)

{

pg_log_info("standby is up and running");

pg_log_info("stopping the server to start the transformation steps");

stop_standby_server(pg_ctl_path, opt.subscriber_dir);

}

* Start a short-lived standby server with temporary parameters (provided

* by command-line options). The goal is to avoid connections during the

* transformation steps.

pg_log_info("starting the standby with command-line options");

start_standby_server(&opt, pg_ctl_path, server_start_log, true);

12. others

As Peter E pointed out [1], the main function is still huge. It has more than 400 lines.
I think all functions should have less than 100 line to keep the readability.

The previous versions moved a lot of code into its own function to improve

readability. Since you mentioned it again, it did some refactor to move some

code outside the main function. I created 2 new functions: setup_recovery

(groups instructions in preparation for recovery) and

drop_primary_replication_slot (remove the primary replication slot if it

exists). At this point, the main steps are in their own functions making it

easier to understand the code IMO.

/* Get the absolute path of pg_ctl and pg_resetwal on the subscriber */

pg_ctl_path = get_exec_path(argv[0], "pg_ctl");

pg_resetwal_path = get_exec_path(argv[0], "pg_resetwal");

...

pg_log_info("Done!");

The code snippet above has ~ 120 lines and the majority of the lines are

comments.

I considered separation idea like below. Note that this may require to change
orderings. How do you think?

* add parse_command_options() which accepts user options and verifies them
* add verification_phase() or something which checks system identifier and calls check_XXX
* add catchup_phase() or something which creates a temporary slot, writes recovery parameters,
and wait until the end of recovery
* add cleanup_phase() or something which removes primary_slot and modifies the
system identifier
* stop/start server can be combined into one wrapper.

IMO generic steps are more difficult to understand. I tend to avoid it. However,

as I said above, I moved some code into its own function. We could probably

consider grouping the check for required/optional arguments into its own

function. Other than that, it wouldn't reduce the main() size and increase the

readability.

Attached txt file is proofs the concept.

13. others

PQresultStatus(res) is called 17 times in this source code, it may be redundant.
I think we can introduce a function like executeQueryOrDie() and gather in one place.

That's a good goal.

14. others

I found that pg_createsubscriber does not refer functions declared in other files.
Is there a possibility to use them, e.g., streamutils.h?

Which functions? IIRC we discussed it in the beginning of this thread but I

didn't find low-hanging fruits to use in this code.

15. others

While reading the old discussions [2], Amit suggested to keep the comment and avoid
creating a temporary slot. You said "Got it" but temp slot still exists.
Is there any reason? Can you clarify your opinion?

I decided to refactor the code and does what Amit Kapila suggested: use the last

replication slot LSN as the replication start point. I keep it in a separate

patch (v26-0002) to make it easier to review. I'm planning to incorporate it if

nobody objects.

16. others

While reading [2] and [3], I was confused the decision. You and Amit discussed
the combination with pg_createsubscriber and slot sync and how should handle
slots on the physical standby. You seemed to agree to remove such a slot, and
Amit also suggested to raise an ERROR. However, you said in [8] that such
handlings is not mandatory so should raise an WARNING in dry_run. I was quite confused.
Am I missing something?

I didn't address this item in this patch. As you pointed out, pg_resetwal does

nothing if replication slots exist. That's not an excuse for doing nothing here.

I agree that we should check this case and provide a suitable error message. If

you have a complex replication scenario, users won't be happy with this

restriction. We can always improve the UI (dropping replication slots during the

execution if an option is provided, for example).

17. others

Per discussion around [4], we might have to consider an if the some options like
data_directory and config_file was initially specified for standby server. Another
easy approach is to allow users to specify options like -o in pg_upgrade [5],
which is similar to your idea. Thought?

I didn't address this item in this patch. I have a half baked patch for it. The

proposal is exactly to allow appending config_file option into -o.

pg_ctl start -o "-c config_file=/etc/postgresql/17/postgresql.conf" ...

18. others

How do you handle the reported failure [6]?

It is a PEBCAK. I don't see an easy way to detect the scenario 1. In the current

mode, we are susceptible to this human failure. The base backup support, won't

allow it. Regarding scenario 2, the referred error is the way to capture this

wrong command line. Do you expect a different message?

19. main
```
char    *pub_base_conninfo = NULL;
char    *sub_base_conninfo = NULL;
char    *dbname_conninfo = NULL;
```

No need to initialize pub_base_conninfo and sub_base_conninfo.
These variables would not be free'd.

Changed.

20. others

IIUC, slot creations would not be finished if there are prepared transactions.
Should we detect it on the verification phase and raise an ERROR?

Maybe. If we decide to do it, we should also check all cases not just prepared

transactions. The other option is to add a sentence into the documentation.

21. others

As I said in [7], the catch up would not be finished if long recovery_min_apply_delay
is used. Should we overwrite during the catch up?

No. If the time-delayed logical replica [2] was available, I would say that we

could use the apply delay for the logical replica. The user can expect that the

replica will continue to have the configured apply delay but that's not the case

if it silently ignore it. I'm not sure if an error is appropriate in this case

because it requires an extra step. Another option is to print a message saying

there is an apply delay. In dry run mode, user can detect this case and make a

decision. Does it seem reasonable?

22. pg_createsubscriber.sgml
```
    <para>
     Check
     Write recovery parameters into the target data...
```

Not sure, but "Check" seems not needed.

It was a typo. Fixed.

[1] https://www.postgresql.org/message-id/CALDaNm215LodC48p7LmfAsuCq9m33CWtcag2%2B9DiyNfWGuL_KQ%40mail.gmail.com

[2] https://www.postgresql.org/message-id/f026292b-c9ee-472e-beaa-d32c5c3a2ced%40www.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

07 March 2024, 04:43:02

On Wed, Mar 6, 2024, at 8:24 AM, vignesh C wrote:

Few comments:

Thanks for your review. Some changes are included in v26.

1)   Can we use strdup here instead of atoi, as we do similarly in
case of pg_dump too, else we will do double conversion, convert using
atoi and again to string while forming the connection string:
+                       case 'p':
+                               if ((opt.sub_port = atoi(optarg)) <= 0)
+                                       pg_fatal("invalid subscriber
port number");
+                               break;

I don't have a strong preference but decided to provide a patch for it. See

v26-0003.

2) We can have some valid range for this, else we will end up in some
unexpected values when a higher number is specified:
+                       case 't':
+                               opt.recovery_timeout = atoi(optarg);
+                               break;

I wouldn't like to add an arbitrary value. Suggestions?

3) Now that we have addressed most of the items, can we handle this TODO:
+               /*
+                * TODO use primary_conninfo (if available) from subscriber and
+                * extract publisher connection string. Assume that there are
+                * identical entries for physical and logical
replication. If there is
+                * not, we would fail anyway.
+                */
+               pg_log_error("no publisher connection string specified");
+               pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+               exit(1);

It is not in my top priority at the moment.

4) By default the log level as info here, I was not sure how to set
it to debug level to get these error messages:
+ pg_log_debug("publisher(%d): connection string: %s",
i, dbinfo[i].pubconninfo);
+ pg_log_debug("subscriber(%d): connection string: %s",
i, dbinfo[i].subconninfo);

<term><option>--verbose</option></term>

<para>

Enables verbose mode. This will cause

<application>pg_createsubscriber</application> to output progress messages

and detailed information about each step to standard error.

Repeating the option causes additional debug-level messages to appear on

standard error.

</para>

5) Currently in non verbose mode there are no messages printed on
console, we could have a few of them printed irrespective of verbose
or not like the following:
a) creating publication
b) creating replication slot
c) waiting for the target server to reach the consistent state
d) If pg_createsubscriber fails after this point, you must recreate
the physical replica before continuing.
e) creating subscription

That's the idea. Quiet mode by default.

6) The message should be "waiting for the target server to reach the
consistent state":
+#define NUM_CONN_ATTEMPTS      5
+
+       pg_log_info("waiting the target server to reach the consistent state");
+
+       conn = connect_database(conninfo, true);

Fixed.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

vignesh C

Date:

07 March 2024, 09:36:45

On Thu, 7 Mar 2024 at 10:05, Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!
>
>
> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

Few comments:
1) We are disconnecting database again in error case too, it will lead
to a double free in this scenario,
+       PQclear(res);
+
+       disconnect_database(conn, false);
+
+       if (max_repslots < num_dbs)
+       {
+               pg_log_error("subscriber requires %d replication
slots, but only %d remain",
+                                        num_dbs, max_repslots);
+               pg_log_error_hint("Consider increasing
max_replication_slots to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }
+
+       if (max_lrworkers < num_dbs)
+       {
+               pg_log_error("subscriber requires %d logical
replication workers, but only %d remain",
+                                        num_dbs, max_lrworkers);
+               pg_log_error_hint("Consider increasing
max_logical_replication_workers to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }

pg_createsubscriber: error: subscriber requires 5 logical replication
workers, but only 4 remain
pg_createsubscriber: hint: Consider increasing
max_logical_replication_workers to at least 5.
free(): double free detected in tcache 2
Aborted

2) We can also check that the primary is not using
synchronous_standby_names, else all the transactions in the primary
will wait infinitely once the standby server is stopped, this could be
added in the documentation:
+/*
+ * Is the primary server ready for logical replication?
+ */
+static void
+check_publisher(struct LogicalRepInfo *dbinfo)
+{
+       PGconn     *conn;
+       PGresult   *res;
+
+       char       *wal_level;
+       int                     max_repslots;
+       int                     cur_repslots;
+       int                     max_walsenders;
+       int                     cur_walsenders;
+
+       pg_log_info("checking settings on publisher");
+
+       conn = connect_database(dbinfo[0].pubconninfo, true);
+
+       /*
+        * If the primary server is in recovery (i.e. cascading replication),
+        * objects (publication) cannot be created because it is read only.
+        */
+       if (server_is_in_recovery(conn))
+       {
+               pg_log_error("primary server cannot be in recovery");
+               disconnect_database(conn, true);
+       }

3) This check is present only for publication, we do not have this in
case of creating a subscription. We can keep both of them similar,
i.e. have the check in both or don't have the check for both
publication and subscription:
+       /* Check if the publication already exists */
+       appendPQExpBuffer(str,
+                                         "SELECT 1 FROM
pg_catalog.pg_publication "
+                                         "WHERE pubname = '%s'",
+                                         dbinfo->pubname);
+       res = PQexec(conn, str->data);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not obtain publication information: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }
+

4) Few options are missing:
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_createsubscriber</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+   <group choice="plain">
+    <group choice="req">
+     <arg choice="plain"><option>-d</option></arg>
+     <arg choice="plain"><option>--database</option></arg>
+    </group>
+    <replaceable>dbname</replaceable>
+    <group choice="req">
+     <arg choice="plain"><option>-D</option> </arg>
+     <arg choice="plain"><option>--pgdata</option></arg>
+    </group>
+    <replaceable>datadir</replaceable>
+    <group choice="req">
+     <arg choice="plain"><option>-P</option></arg>
+     <arg choice="plain"><option>--publisher-server</option></arg>
+    </group>
+    <replaceable>connstr</replaceable>
+   </group>
+  </cmdsynopsis>
+ </refsynopsisdiv>

4a) -n, --dry-run
4b) -p, --subscriber-port
4c) -r, --retain
4d) -s, --socket-directory
4e) -t, --recovery-timeout
4f) -U, --subscriber-username

5) typo connnections should be connections
+       <para>
+        The port number on which the target server is listening for
connections.
+        Defaults to running the target server on port 50432 to avoid unintended
+        client connnections.

6) repliation should be replication
+ * Create the subscriptions, adjust the initial location for logical
+ * replication and enable the subscriptions. That's the last step for logical
+ * repliation setup.

7) I did not notice these changes in the latest patch:
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d808aad8b0..08de2bf4e6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -517,6 +517,7 @@ CreateSeqStmt
 CreateStatsStmt
 CreateStmt
 CreateStmtContext
+CreateSubscriberOptions
 CreateSubscriptionStmt
 CreateTableAsStmt
 CreateTableSpaceStmt
@@ -1505,6 +1506,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalRepInfo
 LogicalRepMsgType
 LogicalRepPartMapEntry
 LogicalRepPreparedTxnData

Regards,
Vignesh

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

07 March 2024, 13:01:38

Hi,

> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

I have created a top-up patch v27-0004. It contains additional test
cases for pg_createsubscriber.

Currently, two testcases (in v27-0004 patch) are failing. These
failures are related to running pg_createsubscriber on nodes in
cascade physical replication and are already reported in [1] and [2].
I think these cases should be fixed. Thoughts?

The idea of this patch is to keep track of testcases, so that any
future patch does not break any scenario which has already been worked
on. These testcases can be used to test in our development process,
but which test should actually be committed, can be discussed later.
Thought?

[1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com
[2]:
https://www.postgresql.org/message-id/CAA4eK1Kq8qWiBK1-ky%2BjkuJRedoWLh5%3DVOmd%2BYwh9L8PUxdq%2BQ%40mail.gmail.com



Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From

Tomas Vondra

Date:

07 March 2024, 17:14:14

Hi,

I decided to take a quick look on this patch today, to see how it works
and do some simple tests. I've only started to get familiar with it, so
I have only some comments / questions regarding usage, not on the code.
It's quite possible I didn't understand some finer points, or maybe it
was already discussed earlier in this very long thread, so please feel
free to push back or point me to the past discussion.

Also, some of this is rather opinionated, but considering I didn't see
this patch before, my opinions may easily be wrong ...


1) SGML docs

It seems the SGML docs are more about explaining how this works on the
inside, rather than how to use the tool. Maybe that's intentional, but
as someone who didn't work with pg_createsubscriber before I found it
confusing and not very helpful.

For example, the first half of the page is prerequisities+warning, and
sure those are useful details, but prerequisities are checked by the
tool (so I can't really miss this) and warnings go into a lot of details
about different places where things may go wrong. Sure, worth knowing
and including in the docs, but maybe not right at the beginning, before
I learn how to even run the tool?

Maybe that's just me, though. Also, I'm sure it's not the only part of
our docs like this. Perhaps it'd be good to reorganize the content a bit
to make the "how to use" stuff more prominent?


2) this is a bit vague

... pg_createsubscriber will check a few times if the connection has
been reestablished to stream the required WAL. After a few attempts, it
terminates with an error.

What does "a few times" mean, and how many is "a few attempts"? Seems
worth knowing when using this tool in environments where disconnections
can happen. Maybe this should be configurable?


3) publication info

For a while I was quite confused about which tables get replicated,
until I realized the publication is FOR ALL TABLES. But I learned that
from this thread, the docs say nothing about this. Surely that's an
important detail that should be mentioned?


4) Is FOR ALL TABLES a good idea?

I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
sure it won't work for a number of use cases. I know large databases
it's common to create "work tables" (not necessarily temporary) as part
of a batch job, but there's no need to replicate those tables.

AFAIK that'd break this FOR ALL TABLES publication, because the tables
will qualify for replication, but won't be present on the subscriber. Or
did I miss something?

I do understand that FOR ALL TABLES is the simplest approach, and for v1
it may be an acceptable limitation, but maybe it'd be good to also
support restricting which tables should be replicated (e.g. blacklist or
whitelist based on table/schema name?).

BTW if I'm right and creating a table breaks the subscriber creation,
maybe it'd be good to explicitly mention that in the docs.

Note: I now realize this might fall under the warning about DDL, which
says this:

    Executing DDL commands on the source server while running
    pg_createsubscriber is not recommended. If the target server has
    already been converted to logical replica, the DDL commands must
    not be replicated so an error would occur.

But I find this confusing. Surely there are many DDL commands that have
absolutely no impact on logical replication (like creating an index or
view, various ALTER TABLE flavors, and so on). And running such DDL
certainly does not trigger error, right?


5) slot / publication / subscription name

I find it somewhat annoying it's not possible to specify names for
objects created by the tool - replication slots, publication and
subscriptions. If this is meant to be a replica running for a while,
after a while I'll have no idea what pg_createsubscriber_569853 or
pg_createsubscriber_459548_2348239 was meant for.

This is particularly annoying because renaming these objects later is
either not supported at all (e.g. for replication slots), or may be
quite difficult (e.g. publications).

I do realize there are challenges with custom names (say, if there are
multiple databases to replicate), but can't we support some simple
formatting with basic placeholders? So we could specify

--slot-name "myslot_%d_%p"

or something like that?

BTW what will happen if we convert multiple standbys? Can't they all get
the same slot name (they all have the same database OID, and I'm not
sure how much entropy the PID has)?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

08 March 2024, 07:03:00

Hi,

> Thanks for the feedback. I'm attaching v26 that addresses most of your comments
> and some issues pointed by Vignesh [1].

I have tested the patch in windows. The pg_createsubscriber command is
failing in windows:

pg_createsubscriber -D ..\standby -d postgres  -P "host=localhost
port=5432" --subscriber-port 9000  -r -v
pg_createsubscriber: validating connection string on publisher
pg_createsubscriber: validating connection string on subscriber
pg_createsubscriber: checking if directory "../standby" is a cluster
data directory
pg_createsubscriber: getting system identifier from publisher
pg_createsubscriber: system identifier is 7343852918334005220 on publisher
pg_createsubscriber: getting system identifier from subscriber
pg_createsubscriber: system identifier is 7343852918334005220 on subscriber
pg_createsubscriber: standby is up and running
pg_createsubscriber: stopping the server to start the transformation steps
pg_createsubscriber: server was stopped
pg_createsubscriber: starting the standby with command-line options
pg_createsubscriber: server was started
pg_createsubscriber: checking settings on subscriber
pg_createsubscriber: error: connection to database failed: connection
to server on socket "D:/project/pg_euler_v27_debug/bin/.s.PGSQL.9000"
failed: Connection refused (0x0000274D/10061)
        Is the server running locally and accepting connections on that socket?

I found out that
+ sub_base_conninfo = psprintf("host=%s port=%u user=%s
fallback_application_name=%s",
+                opt.socket_dir, opt.sub_port, opt.sub_username, progname);

sub_base_conninfo has 'host' even for windows.
So when 'start_standby_server' is called it starts the server in
localhost but when we try to connect to standby inside
'check_subscriber', it tries to connect to the host defined in the
string 'sub_base_conninfo'. So, we are getting the error.

Created a top-up patch v27-0005 to resolve this.
Since there is no change in 0001, 0002...0004 patches that was
previously posted, I have reused the same version number.

Thanks and regards,
Shlok Kyal

On Fri, 8 Mar 2024 at 15:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 8 Mar 2024, at 12:03, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> >
<v27-0004-Add-additional-testcases.patch><v27-0001-pg_createsubscriber-creates-a-new-logical-replic.patch><v27-0005-Fix-error-for-windows.patch><v27-0002-Use-latest-replication-slot-position-as-replicat.patch><v27-0003-port-replace-int-with-string.patch>
>
> I haven't digged into the thread, but recent version fails some CFbot's tests.
>
> http://commitfest.cputube.org/euler-taveira.html
> https://cirrus-ci.com/task/4833499115421696
> ==29928==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a000001458 at pc 0x7f3b29fdedce bp
0x7ffe68fcf1c0sp 0x7ffe68fcf1b8 

This is because of disconnect_database called twice in the error flow:
+       PQclear(res);
+
+       disconnect_database(conn, false);
+
+       if (max_repslots < num_dbs)
+       {
+               pg_log_error("subscriber requires %d replication
slots, but only %d remain",
+                                        num_dbs, max_repslots);
+               pg_log_error_hint("Consider increasing
max_replication_slots to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }
+
+       if (max_lrworkers < num_dbs)
+       {
+               pg_log_error("subscriber requires %d logical
replication workers, but only %d remain",
+                                        num_dbs, max_lrworkers);
+               pg_log_error_hint("Consider increasing
max_logical_replication_workers to at least %d.",
+                                                 num_dbs);
+               disconnect_database(conn, true);
+       }

This is handled in the attached patch set. Apart from this there are a
couple of test failures which will be handled in the upcoming version.

Regards,
Vignesh

On Fri, Mar 8, 2024, at 6:44 AM, Hayato Kuroda (Fujitsu) wrote:

Hmm, right. I considered below improvements. Tomas and Euler, how do you think?

I'm posting a new patchset v28.

I changed the way that the check function works. From the usability

perspective, it is better to test all conditions and reports all errors (if

any) at once. It avoids multiple executions in dry run mode just to figure out

all of the issues in the initial phase. I also included tests for it using

Shlok's idea [1] although I didn't use v27-0004.

Shlok [1] reported that it was failing on Windows since the socket-directory

option was added. I added a fix for it.

Tomas pointed out the documentation [2] does not provide a good high level

explanation about pg_createsubscriber. I expanded the Description section and

moved the prerequisites to Nodes section. The prerequisites were grouped into

target and source conditions on their own paragraph instead of using a list. It

seems more in line with the style of some applications.

As I said in a previous email [3], I removed the retain option.

From

"Euler Taveira"

Date:

15 March 2024, 03:51:48

On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote:

Added a top-up patch v28-0005 to fix this issue.
I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks for your review!

I'm posting a new patch (v29) that merges the previous patches (v28-0002 and

v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during

a rebase. I also included the patch proposed by Shlok [2] that stops the target

server on error if it is running.

Tomas suggested in [3] that maybe the PID should be replaced with something

else that has more entropy. Instead of PID, it uses a random number for

replication slot and subscription. There is also a concern about converting

multiple standbys that will have the same publication name. It added the same

random number to the publication name so it doesn't fail because the

publication already exists. Documentation was changed based on Tomas feedback.

The user name was always included in the subscriber connection string. Let's

have the libpq to choose it. While on it, a new routine (get_sub_conninfo)

contains the code to build the subscriber connection string.

As I said in [4], there wasn't a way to inform a different configuration file.

If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber

starts the server it will fail. The new --config-file option let you inform the

postgresql.conf location and the server is started just fine.

I also did some changes in the start_standby_server routine. I replaced the

strcat and snprintf with appendPQExpBuffer that has been used to store the

pg_ctl command.

[1] https://www.postgresql.org/message-id/TYCPR01MB12077FD21BB186C5A685C0BF3F52A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

[2] https://www.postgresql.org/message-id/CANhcyEW6-dH28gLbFc5XpDTJ6JPizU%2Bt5g-aKUWJBf5W_Zriqw%40mail.gmail.com

[3] https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com

[4] https://www.postgresql.org/message-id/d898faad-f6d7-4b0d-b816-b9dcdf490685%40app.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v29-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz

Re: speed up a logical replica setup

From

Amit Kapila

Date:

15 March 2024, 06:34:01

On Fri, Mar 15, 2024 at 9:23 AM Euler Taveira <euler@eulerto.com> wrote:
>

Did you consider adding options for publication/subscription/slot
names as mentioned in my previous email? As discussed in a few emails
above, it would be quite confusing for users to identify the logical
replication objects once the standby is converted to subscriber.

*
+static void
+cleanup_objects_atexit(void)
{
...
conn = connect_database(dbinfo[i].pubconninfo, false);
+ if (conn != NULL)
+ {
+ if (dbinfo[i].made_publication)
+ drop_publication(conn, &dbinfo[i]);
+ if (dbinfo[i].made_replslot)
+ drop_replication_slot(conn, &dbinfo[i], dbinfo[i].subname);
+ disconnect_database(conn, false);
+ }
+ else
+ {
+ /*
+ * If a connection could not be established, inform the user
+ * that some objects were left on primary and should be
+ * removed before trying again.
+ */
+ if (dbinfo[i].made_publication)
+ {
+ pg_log_warning("There might be a publication \"%s\" in database
\"%s\" on primary",
+    dbinfo[i].pubname, dbinfo[i].dbname);
+ pg_log_warning_hint("Consider dropping this publication before
trying again.");
+ }

It seems we care only for publications created on the primary. Isn't
it possible that some of the publications have been replicated to
standby by that time, for example, in case failure happens after
creating a few publications? IIUC, we don't care for standby cleanup
after failure because it can't be used for streaming replication
anymore. So, the only choice the user has is to recreate the standby
and start the pg_createsubscriber again. This sounds questionable to
me as to whether users would like this behavior. Does anyone else have
an opinion on this point?

I see the below note in the patch:
+    If <application>pg_createsubscriber</application> fails while processing,
+    then the data directory is likely not in a state that can be recovered. It
+    is true if the target server was promoted. In such a case, creating a new
+    standby server is recommended.

By reading this it is not completely clear whether the standby is not
recoverable in case of any error or only an error after the target
server is promoted. If others agree with this behavior then we should
write the detailed reason for this somewhere in the comments as well
unless it is already explained.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

vignesh C

Date:

16 March 2024, 13:31:02

On Mon, 11 Mar 2024 at 10:33, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for updating the patch, but cfbot still got angry [1].
> Note that two containers (autoconf and meson) failed at different place,
> so I think it is intentional one. It seems that there may be a bug related with 32-bit build.
>
> We should see and fix as soon as possible.

I was able to reproduce this random failure and found the following reason:
The Minimum recovery ending location 0/5000000 was more than the
recovery_target_lsn specified is "0/4001198". In few random cases the
standby applies a few more WAL records after the replication slot is
created; this leads to minimum recovery ending location being greater
than the recovery_target_lsn because of which the server will fail
with:
FATAL:  requested recovery stop point is before consistent recovery point

I have fixed it by pausing the replay in the standby server before the
replication slots get created.
The attached v29-0002-Keep-standby-server-s-minimum-recovery-point-les.patch
patch has the changes for the same.
Thoughts?

Regards,
Vignesh

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

16 March 2024, 15:42:51

On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:

Did you consider adding options for publication/subscription/slot
names as mentioned in my previous email? As discussed in a few emails
above, it would be quite confusing for users to identify the logical
replication objects once the standby is converted to subscriber.

Yes. I was wondering to implement after v1 is pushed. I started to write a code

for it but I wasn't sure about the UI. The best approach I came up with was

multiple options in the same order. (I don't provide short options to avoid

possible portability issues with the order.) It means if I have 3 databases and

the following command-line:

pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication

pubx --publication puby --publication pubz

pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.

It seems we care only for publications created on the primary. Isn't
it possible that some of the publications have been replicated to
standby by that time, for example, in case failure happens after
creating a few publications? IIUC, we don't care for standby cleanup
after failure because it can't be used for streaming replication
anymore. So, the only choice the user has is to recreate the standby
and start the pg_createsubscriber again. This sounds questionable to
me as to whether users would like this behavior. Does anyone else have
an opinion on this point?

If it happens after creating a publication and before promotion, the cleanup

routine will drop the publications on primary and it will eventually be applied

to the standby via replication later.

I see the below note in the patch:
+    If <application>pg_createsubscriber</application> fails while processing,
+    then the data directory is likely not in a state that can be recovered. It
+    is true if the target server was promoted. In such a case, creating a new
+    standby server is recommended.

By reading this it is not completely clear whether the standby is not
recoverable in case of any error or only an error after the target
server is promoted. If others agree with this behavior then we should
write the detailed reason for this somewhere in the comments as well
unless it is already explained.

I rewrote the sentence to make it clear that only if the server is promoted,

the target server will be in a state that cannot be reused. It provides a

message saying it too.

pg_createsubscriber: target server reached the consistent state

pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you

must recreate the physical replica before continuing.

I'm attaching a new version (v30) that adds:

* 3 new options (--publication, --subscription, --replication-slot) to assign

names to the objects. The --database option used to ignore duplicate names,

however, since these new options rely on the number of database options to

match the number of object name options, it is forbidden from now on. The

duplication is also forbidden for the object names to avoid errors earlier.

* rewrite the paragraph related to unusuable target server after

pg_createsubscriber fails.

* Vignesh reported an issue [1] related to reaching the recovery stop point

before the consistent state is reached. I proposed a simple patch that fixes

the issue.

[1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

16 March 2024, 15:46:07

On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:

I was able to reproduce this random failure and found the following reason:
The Minimum recovery ending location 0/5000000 was more than the
recovery_target_lsn specified is "0/4001198". In few random cases the
standby applies a few more WAL records after the replication slot is
created; this leads to minimum recovery ending location being greater
than the recovery_target_lsn because of which the server will fail
with:
FATAL: requested recovery stop point is before consistent recovery point

Thanks for checking. I proposed an alternative patch for it [1]. Can you check

it?

[1] https://www.postgresql.org/message-id/34637e7f-0330-420d-8f45-1d022962d2fe%40app.fastmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

vignesh C

Date:

16 March 2024, 17:08:30

On Sat, 16 Mar 2024 at 21:16, Euler Taveira <euler@eulerto.com> wrote:
>
> On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
>
> I was able to reproduce this random failure and found the following reason:
> The Minimum recovery ending location 0/5000000 was more than the
> recovery_target_lsn specified is "0/4001198". In few random cases the
> standby applies a few more WAL records after the replication slot is
> created; this leads to minimum recovery ending location being greater
> than the recovery_target_lsn because of which the server will fail
> with:
> FATAL:  requested recovery stop point is before consistent recovery point
>
>
> Thanks for checking. I proposed an alternative patch for it [1]. Can you check
> it?

This approach looks good to me.

Regards,
Vignesh

Re: speed up a logical replica setup

From

Amit Kapila

Date:

18 March 2024, 04:07:27

On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
>

With this syntax, you want to give the user the option to specify
publication/subscription/slot name for each database? If so, won't it
be too much verbose?

> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be applied
> to the standby via replication later.
>

Do you mean to say that the next time if user uses
pg_createsubscriber, it will be ensured that all the prior WAL will be
replicated? I think we need to ensure that after the restart of this
tool and before it attempts to create the publications again, the WAL
of the previous drop has to be replayed.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

18 March 2024, 05:05:05

On Mon, Mar 18, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
> >
> > Did you consider adding options for publication/subscription/slot
> > names as mentioned in my previous email? As discussed in a few emails
> > above, it would be quite confusing for users to identify the logical
> > replication objects once the standby is converted to subscriber.
> >
> >
> > Yes. I was wondering to implement after v1 is pushed. I started to write a code
> > for it but I wasn't sure about the UI. The best approach I came up with was
> > multiple options in the same order. (I don't provide short options to avoid
> > possible portability issues with the order.) It means if I have 3 databases and
> > the following command-line:
> >
> > pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> > pubx --publication puby --publication pubz
> >
> > pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
> >
>
> With this syntax, you want to give the user the option to specify
> publication/subscription/slot name for each database? If so, won't it
> be too much verbose?
>
> > It seems we care only for publications created on the primary. Isn't
> > it possible that some of the publications have been replicated to
> > standby by that time, for example, in case failure happens after
> > creating a few publications? IIUC, we don't care for standby cleanup
> > after failure because it can't be used for streaming replication
> > anymore. So, the only choice the user has is to recreate the standby
> > and start the pg_createsubscriber again. This sounds questionable to
> > me as to whether users would like this behavior. Does anyone else have
> > an opinion on this point?
> >
> >
> > If it happens after creating a publication and before promotion, the cleanup
> > routine will drop the publications on primary and it will eventually be applied
> > to the standby via replication later.
> >
>
> Do you mean to say that the next time if user uses
> pg_createsubscriber, it will be ensured that all the prior WAL will be
> replicated? I think we need to ensure that after the restart of this
> tool and before it attempts to create the publications again, the WAL
> of the previous drop has to be replayed.
>

On further thinking, the WAL for such dropped publications should get
replayed eventually before the WAL for new publications (the
publications which will be created after restart) unless the required
WAL is removed on primary due to some reason.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

18 March 2024, 05:43:59

Dear Euler,

Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

02.

"The main difference between the logical replication setup and pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a supplement,
not prerequisite.

09.

"The source server must accept connections from the target server. The source server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in recovery."

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."

Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might fail."

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

15.

"a LSN"

s/a/an

16.

"of write-ahead"

s/of/of the/

17.

"specifies promote"

We can do double-quote for the word promote.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the generated
objects by pg_createsubscriber. How do other think?

22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.

23.

```
} CreateSubscriberOptions;
...
} LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

23.

```
static int num_dbs = 0;
static int num_pubs = 0;
static int num_subs = 0;
static int num_replslots = 0;
```

I think the name is bit confusing. The number of generating publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the number of
specified.

My idea is num_custom_pubs or something. Thought?

24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?

25.

```
* Replication slots, publications and subscriptions are created. Depending on
* the step it failed, it should remove the already created objects if it is
* possible (sometimes it won't work due to a connection issue).
```

I think it should be specified here that subscriptions won't be removed with the
reason.

26.

```

/*
* If the server is promoted, there is no way to use the current setup
* again. Warn the user that a new replication setup should be done before
* trying again.
*/
```

Per comment 25, we can add a reference like "See comments atop the function"

27.

usage() was not updated based on recent changes.

28.

```
if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL)
{
if (dbname)
*dbname = pg_strdup(conn_opt->val);
continue;
}
```

There is a memory-leak if multiple dbname are specified in the conninfo.

29.

```
pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
```

No need to initialize the seed every time. Can you reuse pg_prng_state?

30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.

31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
as flag.

32.

```
/*
* Close the connection. If exit_on_error is true, it has an undesired
* condition and it should exit immediately.
*/
static void
disconnect_database(PGconn *conn, bool exit_on_error)
```

In case of disconnect_database(), the second argument should have different name.
If it is true, the process exits unconditionally.
Also, comments atop the function must be fixed.

33.

```
wal_level = strdup(PQgetvalue(res, 0, 0));
```

pg_strdup should be used here.

34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.

36.

```
pg_free(opt.sub_port);
```

You said that the leak won't be concerned here. If so, why only 'p' has pg_free()?

37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.

38.

```
/* Subscriber PID file */
snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);

/*
* If the standby server is running, stop it. Some parameters (that can
* only be set at server start) are informed by command-line options.
*/
if (stat(pidfile, &statbuf) == 0)
```

Hmm. pidfile is used only here, but it is declared in main(). Can it be
separated into another funtion like is_standby_started()?

39.

Or, we may able to introcue "restart_standby_if_needed" or something.

40.

```
* XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is related
with the controldata file, I think it can be located in controldata_util.c.

41.

You said like below in [1], but I could not find the related fix. Can you clarify?

> That's a good point. We should state in the documentation that GUCs specified in
> the command-line options are ignored during the execution.

[1]: https://www.postgresql.org/message-id/40595e73-c7e1-463a-b8be-49792e870007%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: speed up a logical replica setup

From

vignesh C

Date:

18 March 2024, 07:18:28

On Sat, 16 Mar 2024 at 21:13, Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.
>
> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be applied
> to the standby via replication later.
>
> I see the below note in the patch:
> +    If <application>pg_createsubscriber</application> fails while processing,
> +    then the data directory is likely not in a state that can be recovered. It
> +    is true if the target server was promoted. In such a case, creating a new
> +    standby server is recommended.
>
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.
>
>
> I rewrote the sentence to make it clear that only if the server is promoted,
> the target server will be in a state that cannot be reused. It provides a
> message saying it too.
>
> pg_createsubscriber: target server reached the consistent state
> pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
> must recreate the physical replica before continuing.
>
> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(&opt.replslot_names, optarg))
+ {
+ simple_string_list_append(&opt.replslot_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
 ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists

2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(&opt.sub_names, optarg))
+ {
+ simple_string_list_append(&opt.sub_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3

Subscriptions will be created with the same name and later there will
be a problem when setting replication progress as there will be
multiple subscriptions with the same name.

Regards,
Vignesh

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

18 March 2024, 11:06:30

On 18.03.24 08:18, vignesh C wrote:
> 1) Maximum size of the object name is 64, we can have a check so that
> we don't specify more than the maximum allowed length:
> + case 3:
> + if (!simple_string_list_member(&opt.replslot_names, optarg))
> + {
> + simple_string_list_append(&opt.replslot_names, optarg);
> + num_replslots++;
> + }
> + else
> + {
> + pg_log_error("duplicate replication slot \"%s\"", optarg);
> + exit(1);
> + }
> + break;
> 
> If we allow something like this:
>   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> In this case creation of replication slot will fail:
> pg_createsubscriber: error: could not create replication slot
> "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> database "db2": ERROR:  replication slot
> "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> already exists

I think this is fine.  The server can check whether the names it is 
given are of the right size.  We don't need to check it again in the client.

> 2) Similarly here too:
> + case 4:
> + if (!simple_string_list_member(&opt.sub_names, optarg))
> + {
> + simple_string_list_append(&opt.sub_names, optarg);
> + num_subs++;
> + }
> + else
> + {
> + pg_log_error("duplicate subscription \"%s\"", optarg);
> + exit(1);
> + }
> + break;
> 
> If we allow something like this:
> ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> 
> Subscriptions will be created with the same name and later there will
> be a problem when setting replication progress as there will be
> multiple subscriptions with the same name.

Could you clarify this problem?

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

18 March 2024, 11:15:16

On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote:
> 02.
> 
> "The main difference between the logical replication setup and pg_createsubscriber
> is the initial data copy."
> 
> Grammarly suggests:
> "The initial data copy is the main difference between the logical replication
> setup and pg_createsubscriber."

I think that change is worse.

> 09.
> 
> "The source server must accept connections from the target server. The source server must not be in recovery."
> 
> Grammarly suggests:
> "The source server must accept connections from the target server and not be in recovery."

I think the previous version is better.

> 17.
> 
> "specifies promote"
> 
> We can do double-quote for the word promote.

The v30 patch has <literal>promote</literal>, which I think is adequate.

> 19.
> 
> "is accepting read-write transactions"
> 
> Grammarly suggests:
> "accepts read-write transactions"

I like the first one better.

> 20.
> 
> New options must be also documented as well. This helps not only users but also
> reviewers.
> (Sometimes we cannot identify that the implementation is intentinal or not.)

Which ones are missing?

> 21.
> 
> Also, not sure the specification is good. I preferred to specify them by format
> string. Because it can reduce the number of arguments and I cannot find use cases
> which user want to control the name of objects.
> 
> However, your approach has a benefit which users can easily identify the generated
> objects by pg_createsubscriber. How do other think?

I think listing them explicitly is better for the first version.  It's 
simpler to implement and more flexible.

> 22.
> 
> ```
> #define    BASE_OUTPUT_DIR        "pg_createsubscriber_output.d"
> ```
> 
> No one refers the define.

This is gone in v30.

> 23.
>   
> ```
> }            CreateSubscriberOptions;
> ...
> }            LogicalRepInfo;
> ```
> 
> Declarations after the "{" are not needed, because we do not do typedef.

Yeah, this is actually wrong, because as it is written now, it defines 
global variables.

> 22.
> 
> While seeing definitions of functions, I found that some pointers are declared
> as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
> changed but not the constant. Is it just missing or is there another rule?

Yes, more could be done here.  I have attached a patch for this.  (This 
also requires the just-committed 48018f1d8c.)

> 24.
> 
> ```
> /* standby / subscriber data directory */
> static char *subscriber_dir = NULL;
> ```
> 
> It is bit strange that only subscriber_dir is a global variable. Caller requires
> the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
> main. So, how about makeing `CreateSubscriberOptions opt` to global one?

Fewer global variables seem preferable.  Only make global as needed.

> 30.
> 
> ```
>         if (num_replslots == 0)
>             dbinfo[i].replslotname = pg_strdup(genname);
> ```
> 
> I think the straightforward way is to use the name of subscription if no name
> is specified. This follows the rule for CREATE SUBSCRIPTION.

right

> 31.
> 
> ```
>         /* Create replication slot on publisher */
>         if (lsn)
>             pg_free(lsn);
> ```
> 
> I think allocating/freeing memory is not so efficient.
> Can we add a flag to create_logical_replication_slot() for controlling the
> returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
> as flag.

Nothing is even using the return value of 
create_logical_replication_slot().  I think this can be removed altogether.

> 34.
> 
> ```
>         {"config-file", required_argument, NULL, 1},
>         {"publication", required_argument, NULL, 2},
>         {"replication-slot", required_argument, NULL, 3},
>         {"subscription", required_argument, NULL, 4},
> ```
> 
> The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
> options which do not have short notation are listed behind.
> 
> 35.
> 
> ```
>     opt.sub_port = palloc(16);
> ```
> 
> Per other lines, pg_alloc() should be used.

Even better psprintf().

> 37.
> 
> ```
>     /* Register a function to clean up objects in case of failure */
>     atexit(cleanup_objects_atexit);
> ```
> 
> Sorry if we have already discussed. I think the registration can be moved just
> before the boot of the standby. Before that, the callback will be no-op.

But it can also stay where it is.  What is the advantage of moving it later?

> 40.
> 
> ```
>      * XXX this code was extracted from BootStrapXLOG().
> ```
> 
> So, can we extract the common part to somewhere? Since system identifier is related
> with the controldata file, I think it can be located in controldata_util.c.

Let's leave it as is for this PG release.

Attachment

0001-fixup-pg_createsubscriber-creates-a-new-logical-repl.patch

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

18 March 2024, 13:52:28

On 16.03.24 16:42, Euler Taveira wrote:
> I'm attaching a new version (v30) that adds:

I have some review comments and attached a patch with some smaller 
fixups (mainly message wording and avoid fixed-size string buffers).

* doc/src/sgml/ref/pg_createsubscriber.sgml

I would remove the "How It Works" section.  This is not relevant to
users, and it is very detailed and will require updating whenever the
implementation changes.  It could be a source code comment instead.

* src/bin/pg_basebackup/pg_createsubscriber.c

I think the connection string handling is not robust against funny
characters, like spaces, in database names etc.

Most SQL commands need to be amended for proper identifier or string
literal quoting and/or escaping.

In check_subscriber(): All these permissions checks seem problematic
to me.  We shouldn't reimplement our own copy of the server's
permission checks.  The server can check the permissions.  And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of.  Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever.  But I would rather just remove all this, it seems too
problematic.

In main(): The first check if the standby is running is problematic.
I think it would be better to require that the standby is initially
shut down.  Consider, the standby might be running under systemd.
This tool will try to stop it, systemd will try to restart it.  Let's
avoid these kinds of battles.  It's also safer if we don't try to
touch running servers.

The -p option (--subscriber-port) doesn't seem to do anything.  In my
testing, it always uses the compiled-in default port.

Printing all the server log lines to the terminal doesn't seem very
user-friendly.  Not sure what to do about that, short of keeping a 
pg_upgrade-style directory of log files.  But it's ugly.

Attachment

0001-Various-improvements.patch

Re: speed up a logical replica setup

From

Alvaro Herrera

Date:

18 March 2024, 14:09:23

On 2024-Mar-18, Peter Eisentraut wrote:

> * src/bin/pg_basebackup/pg_createsubscriber.c
> 
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.

Maybe it would be easier to deal with this by passing around a struct
with keyword/value pairs that can be given to PQconnectdbParams (and
keeping dbname as a param that's passed separately, so that it can be
added when one is needed), instead of messing with the string conninfos;
then you don't have to worry about quoting.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

19 March 2024, 04:54:23

Dear Peter,

Thanks for giving comments. I want to reply some of them.

> > 17.
> >
> > "specifies promote"
> >
> > We can do double-quote for the word promote.
> 
> The v30 patch has <literal>promote</literal>, which I think is adequate.

Opps. Actually I did look v29 patch while firstly reviewing. Sorry for noise.

> 
> > 20.
> >
> > New options must be also documented as well. This helps not only users but
> also
> > reviewers.
> > (Sometimes we cannot identify that the implementation is intentinal or not.)
> 
> Which ones are missing?

In v29, newly added options (publication/subscription/replication-slot) was not added.
Since they have been added, please ignore.

> > 21.
> >
> > Also, not sure the specification is good. I preferred to specify them by format
> > string. Because it can reduce the number of arguments and I cannot find use
> cases
> > which user want to control the name of objects.
> >
> > However, your approach has a benefit which users can easily identify the
> generated
> > objects by pg_createsubscriber. How do other think?
> 
> I think listing them explicitly is better for the first version.  It's
> simpler to implement and more flexible.

OK.

> > 22.
> >
> > ```
> > #define    BASE_OUTPUT_DIR
>     "pg_createsubscriber_output.d"
> > ```
> >
> > No one refers the define.
> 
> This is gone in v30.

I wrote due to the above reason. Please ignore...

> 
> > 31.
> >
> > ```
> >         /* Create replication slot on publisher */
> >         if (lsn)
> >             pg_free(lsn);
> > ```
> >
> > I think allocating/freeing memory is not so efficient.
> > Can we add a flag to create_logical_replication_slot() for controlling the
> > returning value (NULL or duplicated string)? We can use the condition (i ==
> num_dbs-1)
> > as flag.
> 
> Nothing is even using the return value of
> create_logical_replication_slot().  I think this can be removed altogether.

> > 37.
> >
> > ```
> >     /* Register a function to clean up objects in case of failure */
> >     atexit(cleanup_objects_atexit);
> > ```
> >
> > Sorry if we have already discussed. I think the registration can be moved just
> > before the boot of the standby. Before that, the callback will be no-op.
> 
> But it can also stay where it is.  What is the advantage of moving it later?

I thought we could reduce the risk of bugs. Previously some bugs were reported
because the registration is too early. However, this is not a strong opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

19 March 2024, 07:05:17

On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.
>

If we remove all the checks then there is a possibility that we can
fail later while creating the actual subscription. For example, if
there are not sufficient max_replication_slots, then it is bound to
fail in the later steps which would be a costlier affair because by
that time the standby would have been promoted and the user won't have
any way to move forward but to re-create standby and then use this
tool again. I think here the patch tries to mimic pg_upgrade style
checks where we do some pre-checks.

This raises a question in my mind how are we expecting users to know
all these required values and configure it properly before using this
tool?  IIUC, we are expecting that the user should figure out the
appropriate values for max_replication_slots,
max_logical_replication_workers, etc. by querying the number of
databases in the primary. Then either stop the standby to change these
parameters or use ALTER SYSTEM depending on the required parameters.
Similarly, there are some config requirements (like max_wal_senders,
max_replication_slots) for the primary which would be difficult for
users to know as these are tools are internal requirements.

The two possibilities that come to my mind are (a) pg_createsubscriber
should have some special mode/option using which the user can find out
all the required config settings, or (b) document how a user can find
the required settings. There are good chances of mistakes with option
(b).

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

19 March 2024, 11:26:10

Hi,

> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.
> * Vignesh reported an issue [1] related to reaching the recovery stop point
>   before the consistent state is reached. I proposed a simple patch that fixes
>   the issue.
>
> [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
>

I have added a top-up patch v30-0003. The issue in [1] still exists in
the v30 patch. I was not able to come up with an approach to handle it
in the code, so I have added it to the documentation in the warning
section. Thoughts?
I am not changing the version as I have not made any changes in
v30-0001 and v30-0002.

[1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com



Thanks and regards,
Shlok Kyal

> Hi,
>
> > I'm attaching a new version (v30) that adds:
> >
> > * 3 new options (--publication, --subscription, --replication-slot) to assign
> >   names to the objects. The --database option used to ignore duplicate names,
> >   however, since these new options rely on the number of database options to
> >   match the number of object name options, it is forbidden from now on. The
> >   duplication is also forbidden for the object names to avoid errors earlier.
> > * rewrite the paragraph related to unusuable target server after
> >   pg_createsubscriber fails.
> > * Vignesh reported an issue [1] related to reaching the recovery stop point
> >   before the consistent state is reached. I proposed a simple patch that fixes
> >   the issue.
> >
> > [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
> >
>
> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?
> I am not changing the version as I have not made any changes in
> v30-0001 and v30-0002.
>
> [1]: https://www.postgresql.org/message-id/CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com

There was some whitespace error in the v30-0003 patch. Updated the patch.



Thanks and regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

20 March 2024, 10:16:13

On Tue, Mar 19, 2024 at 8:54 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 18 Mar 2024 at 16:36, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 18.03.24 08:18, vignesh C wrote:
> > > 1) Maximum size of the object name is 64, we can have a check so that
> > > we don't specify more than the maximum allowed length:
> > > + case 3:
> > > + if (!simple_string_list_member(&opt.replslot_names, optarg))
> > > + {
> > > + simple_string_list_append(&opt.replslot_names, optarg);
> > > + num_replslots++;
> > > + }
> > > + else
> > > + {
> > > + pg_log_error("duplicate replication slot \"%s\"", optarg);
> > > + exit(1);
> > > + }
> > > + break;
> > >
> > > If we allow something like this:
> > >   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > > db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> > > In this case creation of replication slot will fail:
> > > pg_createsubscriber: error: could not create replication slot
> > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> > > database "db2": ERROR:  replication slot
> > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> > > already exists
> >
> > I think this is fine.  The server can check whether the names it is
> > given are of the right size.  We don't need to check it again in the client.
> >
> > > 2) Similarly here too:
> > > + case 4:
> > > + if (!simple_string_list_member(&opt.sub_names, optarg))
> > > + {
> > > + simple_string_list_append(&opt.sub_names, optarg);
> > > + num_subs++;
> > > + }
> > > + else
> > > + {
> > > + pg_log_error("duplicate subscription \"%s\"", optarg);
> > > + exit(1);
> > > + }
> > > + break;
> > >
> > > If we allow something like this:
> > > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > > db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> > >
> > > Subscriptions will be created with the same name and later there will
> > > be a problem when setting replication progress as there will be
> > > multiple subscriptions with the same name.
> >
> > Could you clarify this problem?
>
> In this case the subscriptions name specified is more than the allowed
> name, the subscription name will be truncated and both the
> subscription for db1 and db2 will have same name like below:
> db2=# select subname, subdbid from pg_subscription;
>                              subname                             | subdbid
> -----------------------------------------------------------------+---------
>  testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16384
>  testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16385
>
> Now we try to set the replication origin of the subscriptions to a
> consistent lsn from the following:
> +set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo,
> const char *lsn)
> +{
> +       PQExpBuffer str = createPQExpBuffer();
> +       PGresult   *res;
> +       Oid                     suboid;
> +       char            originname[NAMEDATALEN];
> +       char            lsnstr[17 + 1]; /* MAXPG_LSNLEN = 17 */
> +
> +       Assert(conn != NULL);
> +
> +       appendPQExpBuffer(str,
> +                                         "SELECT oid FROM
> pg_catalog.pg_subscription "
> +                                         "WHERE subname = '%s'",
> +                                         dbinfo->subname);
> +
> +       res = PQexec(conn, str->data);
> +       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +       {
> +               pg_log_error("could not obtain subscription OID: %s",
> +                                        PQresultErrorMessage(res));
> +               disconnect_database(conn, true);
> +       }
> +
> +       if (PQntuples(res) != 1 && !dry_run)
> +       {
> +               pg_log_error("could not obtain subscription OID: got
> %d rows, expected %d rows",
> +                                        PQntuples(res), 1);
> +               disconnect_database(conn, true);
> +       }
>
> Since the subscription name is truncated, we will have multiple
> records returned for the above query which results in failure with:
> pg_createsubscriber: error: could not obtain subscription OID: got 2
> rows, expected 1 rows
> pg_createsubscriber: warning: pg_createsubscriber failed after the end
> of recovery
> pg_createsubscriber: hint: The target server cannot be used as a
> physical replica anymore.
> pg_createsubscriber: hint: You must recreate the physical replica
> before continuing.
>
> The problem with this failure is that standby has been promoted
> already and we will have to re-create the physica replica again.
>
> If you are not planning to have the checks for name length, this could
> alternatively be fixed by including database id also while querying
> pg_subscription like below in set_replication_progress function:
> appendPQExpBuffer(str,
> "SELECT oid FROM pg_catalog.pg_subscription \n"
> "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> pg_catalog.pg_database WHERE datname = '%s')",
> dbinfo->subname,
> dbinfo->dbname);

The attached patch has the changes to handle the same.

Thanks and Regards,
Shubham Khanna.

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

21 March 2024, 04:19:10

On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:

On 16.03.24 16:42, Euler Taveira wrote:
> I'm attaching a new version (v30) that adds:

I have some review comments and attached a patch with some smaller
fixups (mainly message wording and avoid fixed-size string buffers).

Thanks for your review. I'm attaching a new version (v32) that includes your

fixups, merges the v30-0002 into the main patch [1], addresses Hayato's review[2],

your reviews [3][4], and fixes the query for set_replication_progress() [5].

* doc/src/sgml/ref/pg_createsubscriber.sgml

I would remove the "How It Works" section. This is not relevant to
users, and it is very detailed and will require updating whenever the
implementation changes. It could be a source code comment instead.

It uses the same structure as pg_rewind that also describes how it works

internally. I included a separate patch that completely removes the section.

* src/bin/pg_basebackup/pg_createsubscriber.c

I think the connection string handling is not robust against funny
characters, like spaces, in database names etc.

get_base_conninfo() uses PQconninfoParse to parse the connection string. I

expect PQconnectdb to provide a suitable error message in this case. Even if it

builds keywords and values arrays, it is also susceptible to the same issue, no?

Most SQL commands need to be amended for proper identifier or string
literal quoting and/or escaping.

I completely forgot about this detail when I added the new options in v30. It is

fixed now. I also changed the tests to exercise it.

In check_subscriber(): All these permissions checks seem problematic
to me. We shouldn't reimplement our own copy of the server's
permission checks. The server can check the permissions. And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of. Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever. But I would rather just remove all this, it seems too
problematic.

The main goal of the check_* functions are to minimize error during execution.

I removed the permission checks. The GUC checks were kept.

In main(): The first check if the standby is running is problematic.
I think it would be better to require that the standby is initially
shut down. Consider, the standby might be running under systemd.
This tool will try to stop it, systemd will try to restart it. Let's
avoid these kinds of battles. It's also safer if we don't try to
touch running servers.

That's a good point. I hadn't found an excuse to simplify this but you provided

one. :) There was a worry about ignoring some command-line options that changes

GUCs if the server was started. There was also an ugly case for dry run mode

that has to start the server (if it was running) at the end. Both cases are no

longer issues. The current code provides a suitable error if the target server

is running.

The -p option (--subscriber-port) doesn't seem to do anything. In my
testing, it always uses the compiled-in default port.

It works for me. See this snippet from the regression tests. The port (50945) is

used by pg_ctl.

# Running: pg_createsubscriber --verbose --verbose --pgdata /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 --database pg2

pg_createsubscriber: validating connection string on publisher

pg_createsubscriber: pg_ctl command is: "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp/qpngb0bPKo'"

2024-03-20 18:15:24.517 -03 [105195] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

2024-03-20 18:15:24.517 -03 [105195] LOG: listening on Unix socket "/tmp/qpngb0bPKo/.s.PGSQL.50945"

Printing all the server log lines to the terminal doesn't seem very
user-friendly. Not sure what to do about that, short of keeping a
pg_upgrade-style directory of log files. But it's ugly.

I removed the previous implementation that creates a new directory and stores

the log file there. I don't like the pg_upgrade-style directory because (a) it

stores part of the server log files in another place and (b) it is another

directory to ignore if your tool handles the data directory (like a backup

tool). My last test said it prints 35 server log lines. I expect that the user

redirects the output to a file so he/she can inspect it later if required.

[1] https://www.postgresql.org/message-id/34637e7f-0330-420d-8f45-1d022962d2fe%40app.fastmail.com

[2] https://www.postgresql.org/message-id/TYCPR01MB12077E2ACB82680CD7BAA22F9F52D2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

[3] https://www.postgresql.org/message-id/085203c4-580d-4c50-90e3-e47249e14585%40eisentraut.org

[4] https://www.postgresql.org/message-id/7a970912-0b77-4942-84f7-2c9ca0bc05a5%40eisentraut.org

[5] https://www.postgresql.org/message-id/CALDaNm053p5Drh9%2B%3DVvkH0Zf_3xoLB%2Bxw8RsNZjEwhc5xc_W%3DQ%40mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

21 March 2024, 04:48:30

On Mon, Mar 18, 2024, at 2:43 AM, Hayato Kuroda (Fujitsu) wrote:

Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

Thanks for another review. I posted a new patch (v32) that hopefully addresses

these points.

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

Fixed.

02.

"The main difference between the logical replication setup and pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

Not fixed.

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

I avoided pg_createsubscriber at the beginning because it is already

used in the previous sentence. I kept the last part of the sentence

because it is similar to one in the logical replication [1].

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

Reading the sentence again, it is not clear. When I said "most of

execution time" I was referring to the actual logical replication setup.

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

Isn't it better to leave these details to "How It Works"?

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

Fixed.

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

Fixed.

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a supplement,
not prerequisite.

Fixed.

09.

"The source server must accept connections from the target server. The source server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in recovery."

Not fixed.

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

Fixed.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."

Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might fail."

Fixed.

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

Not fixed. Our documentation uses "recovery process".

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

I didn't find this one. Maybe you checked a previous version.

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

Fixed.

15.

"a LSN"

s/a/an

Fixed.

16.

"of write-ahead"

s/of/of the/

Fixed.

17.

"specifies promote"

We can do double-quote for the word promote.

Why? It is referring to recovery_target_action. If you check this GUC,

you will notice that it also uses literal tag.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

Fixed.

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

Not fixed.

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

I don't know what are you referring to? If the new options are

--publication, --subscription and --replication-slot, they are

documented. Are you checking the latest patch?

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the generated
objects by pg_createsubscriber. How do other think?

I prefer explicit options. We can always expand it later if people think it is a

good idea to provide a format string.

22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.

It was removed in v30.

23.

```
} CreateSubscriberOptions;
...
} LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

It is a leftover when I removed the typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

It slipped my mind. Peter's fixups improves it.

23.

```
static int num_dbs = 0;
static int num_pubs = 0;
static int num_subs = 0;
static int num_replslots = 0;
```

I think the name is bit confusing. The number of generating publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the number of
specified.

My idea is num_custom_pubs or something. Thought?

What does "custom" add to make the name clear? I added comments saying

so.

24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?

I avoided turning all the options global variables. Since the cleanup routine

required the target data directory to be a global variable, I just did it and

left the others alone.

25.

```
* Replication slots, publications and subscriptions are created. Depending on
* the step it failed, it should remove the already created objects if it is
* possible (sometimes it won't work due to a connection issue).
```

I think it should be specified here that subscriptions won't be removed with the
reason.

I rephrased this comment.

26.

```

/*
* If the server is promoted, there is no way to use the current setup
* again. Warn the user that a new replication setup should be done before
* trying again.
*/
```

Per comment 25, we can add a reference like "See comments atop the function"

It is a few lines above. I don't think you have to point it out. If you

are unsure about this decision, you should check the whole function.

27.

usage() was not updated based on recent changes.

Check v30.

28.

```
if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL)
{
if (dbname)
*dbname = pg_strdup(conn_opt->val);
continue;
}
```

There is a memory-leak if multiple dbname are specified in the conninfo.

It is not a worrying or critical memory leak.

29.

```
pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
```

No need to initialize the seed every time. Can you reuse pg_prng_state?

Sure.

30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.

Agreed.

31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
as flag.

It is not. This code path is not critical. You are suggesting to add

complexity here. Efficiency is a good goal but in this case it only adds

complexity with small return.

32.

```
/*
* Close the connection. If exit_on_error is true, it has an undesired
* condition and it should exit immediately.
*/
static void
disconnect_database(PGconn *conn, bool exit_on_error)
```

In case of disconnect_database(), the second argument should have different name.
If it is true, the process exits unconditionally.
Also, comments atop the function must be fixed.

I choose a short name. The comment seems ok to me.

33.

```
wal_level = strdup(PQgetvalue(res, 0, 0));
```

pg_strdup should be used here.

Fixed.

34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

Fixed.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.

I think you meant pg_malloc. Fixed.

36.

```
pg_free(opt.sub_port);
```

You said that the leak won't be concerned here. If so, why only 'p' has pg_free()?

Fixed.

37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.

The main reason is to catch future cases added *before* the point you

want to move this call that requires a cleanup. As you said it is a

no-op. My preference for atexit() calls is to add it as earlier as

possible to avoid leaving cases that it should trigger.

38.

```
/* Subscriber PID file */
snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);

/*
* If the standby server is running, stop it. Some parameters (that can
* only be set at server start) are informed by command-line options.
*/
if (stat(pidfile, &statbuf) == 0)
```

Hmm. pidfile is used only here, but it is declared in main(). Can it be
separated into another funtion like is_standby_started()?

It is so small that I didn't bother adding a new function for it.

39.

Or, we may able to introcue "restart_standby_if_needed" or something.

40.

```
* XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is related
with the controldata file, I think it can be located in controldata_util.c.

I added this comment here as a reference from where I extracted the

code. The referred function is from backend. Feel free to propose a

separate patch for it.

41.

You said like below in [1], but I could not find the related fix. Can you clarify?

> That's a good point. We should state in the documentation that GUCs specified in
> the command-line options are ignored during the execution.

I added a sentence for it. See "How It Works".

[1] https://www.postgresql.org/docs/current/logical-replication-architecture.html#LOGICAL-REPLICATION-SNAPSHOT

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

21 March 2024, 04:56:11

On Tue, Mar 19, 2024, at 8:57 AM, Peter Eisentraut wrote:

On 19.03.24 12:26, Shlok Kyal wrote:
>
> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?

Seems acceptable to me. pg_createsubscriber will probably always have
some restrictions and unsupported edge cases like that. We can't
support everything, so documenting is ok.

Shlok, I'm not sure we should add a sentence about a pilot error. I added a

comment in check_subscriber that describes this situation. I think the comment

is sufficient to understand the limitation and, if it is possible in the future,

a check might be added for it. I didn't include v31-0004.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

21 March 2024, 04:59:06

On Wed, Mar 20, 2024, at 7:16 AM, Shubham Khanna wrote:

On Tue, Mar 19, 2024 at 8:54 PM vignesh C <vignesh21@gmail.com> wrote:
>
> If you are not planning to have the checks for name length, this could
> alternatively be fixed by including database id also while querying
> pg_subscription like below in set_replication_progress function:
> appendPQExpBuffer(str,
> "SELECT oid FROM pg_catalog.pg_subscription \n"
> "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> pg_catalog.pg_database WHERE datname = '%s')",
> dbinfo->subname,
> dbinfo->dbname);

The attached patch has the changes to handle the same.

I included a different query that does the same. See v32.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

21 March 2024, 09:49:16

Hi,

There is a compilation error while building postgres with the patch
due to a recent commit. I have attached a top-up patch v32-0003 to
resolve this compilation error.
I have not updated the version of the patch as I have not made any
change in v32-0001 and v32-0002 patch.

Thanks and regards,
Shlok Kyal

On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:

If we commit this we might not be able to change the way the option
behaves once the customers starts using it. How about removing these
options in the first version and adding it in the next version after
more discussion.

We don't need to redesign this one if we want to add a format string in a next

version. A long time ago, pg_dump started to accept pattern for tables without

breaking or deprecating the -t option. If you have 100 databases and you don't

want to specify the options or use a script to generate it for you, you also

have the option to let pg_createsubscriber generate the object names for you.

Per my experience, it will be a rare case.

Currently dry-run will do the check and might fail on identifying a
few failures like after checking subscriber configurations. Then the
user will have to correct the configuration and re-run then fix the
next set of failures. Whereas the suggest-config will display all the
optimal configuration for both the primary and the standby in a single
shot. This is not a must in the first version, it can be done as a
subsequent enhancement.

Do you meant publisher, right? Per order, check_subscriber is done before

check_publisher and it checks all settings on the subscriber before exiting. In

v30, I changed the way it provides the required settings. In a previous version,

it fails when it found a wrong setting; the current version, check all settings

from that server before providing a suitable error.

pg_createsubscriber: checking settings on publisher

pg_createsubscriber: primary has replication slot "physical_slot"

pg_createsubscriber: error: publisher requires wal_level >= logical

pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 remain

pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 3.

pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain

pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.

If you have such an error, you will fix them all and rerun using dry run mode

again to verify everything is ok. I don't have a strong preference about it. It

can be changed easily (unifying the check functions or providing a return for

each of the check functions).

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

22 March 2024, 03:31:34

On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:

There is a compilation error while building postgres with the patch
due to a recent commit. I have attached a top-up patch v32-0003 to
resolve this compilation error.
I have not updated the version of the patch as I have not made any
change in v32-0001 and v32-0002 patch.

I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the

main patch (v32-0001). This version also includes 2 new tests:

- refuse to run if the standby server is running

- refuse to run if the standby was promoted e.g. it is not in recovery

The first one exercises a recent change (standby should be stopped) and the

second one covers an important requirement.

Based on the discussion [1] about the check functions, Vignesh suggested that it

should check both server before exiting. v33-0003 implements it. I don't have a

strong preference; feel free to apply it.

[1] https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

Amit Kapila

Date:

22 March 2024, 03:54:10

On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
>
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.
>
>
> We don't need to redesign this one if we want to add a format string in a next
> version. A long time ago, pg_dump started to accept pattern for tables without
> breaking or deprecating the -t option. If you have 100 databases and you don't
> want to specify the options or use a script to generate it for you, you also
> have the option to let pg_createsubscriber generate the object names for you.
> Per my experience, it will be a rare case.
>

But, why go with some UI in the first place which we don't think is a
good one, or at least don't have a broader agreement that it is a good
one? The problem with self-generated names for users could be that
they won't be able to make much out of it. If one has to always use
those internally then probably that would be acceptable. I would
prefer what Tomas proposed a few emails ago as compared to this one as
that has fewer options to be provided by users but still, they can
later identify objects. But surely, we should discuss if you or others
have better alternatives.

The user choosing to convert a physical standby to a subscriber would
in some cases be interested in converting it for all the databases
(say for the case of upgrade [1]) and giving options for each database
would be cumbersome for her.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.
>
>
> Do you meant publisher, right? Per order, check_subscriber is done before
> check_publisher and it checks all settings on the subscriber before exiting. In
> v30, I changed the way it provides the required settings. In a previous version,
> it fails when it found a wrong setting; the current version, check all settings
> from that server before providing a suitable error.
>
> pg_createsubscriber: checking settings on publisher
> pg_createsubscriber: primary has replication slot "physical_slot"
> pg_createsubscriber: error: publisher requires wal_level >= logical
> pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 remain
> pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 3.
> pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain
> pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.
>
> If you have such an error, you will fix them all and rerun using dry run mode
> again to verify everything is ok. I don't have a strong preference about it. It
> can be changed easily (unifying the check functions or providing a return for
> each of the check functions).
>

We can unify the checks but not sure if it is worth it. I am fine
either way. It would have been better if we provided a way for a user
to know the tool's requirement rather than letting him know via errors
but I think it should be okay to extend it later as well.

[1] - This tool can be used in an upgrade where the user first
converts physical standby to subscriber to get incremental changes and
then performs an online upgrade.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Fabrízio de Royes Mello

Date:

22 March 2024, 04:13:55

On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.
>
> ...
>
> [1] - This tool can be used in an upgrade where the user first
> converts physical standby to subscriber to get incremental changes and
> then performs an online upgrade.
>

The first use case me and Euler discussed some time ago to upstream this tool was exactly what Amit described so IMHO we should make it easier for users to subscribe to all existing user databases.

Regards,

--
Fabrízio de Royes Mello

Re: speed up a logical replica setup

From

Amit Kapila

Date:

22 March 2024, 04:25:48

On Fri, Mar 22, 2024 at 9:44 AM Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > The user choosing to convert a physical standby to a subscriber would
> > in some cases be interested in converting it for all the databases
> > (say for the case of upgrade [1]) and giving options for each database
> > would be cumbersome for her.
> >
> > ...
> >
> > [1] - This tool can be used in an upgrade where the user first
> > converts physical standby to subscriber to get incremental changes and
> > then performs an online upgrade.
> >
>
> The first use case me and Euler discussed some time ago to upstream this tool was exactly what Amit described so IMHO
weshould make it easier for users to subscribe to all existing user databases. 
>

I feel that will be a good use case for this tool especially now
(with this release) when we allow upgrade of subscriptions. In this
regard, I feel if the user doesn't specify any database it should have
subscriptions for all databases and the user should have a way to
provide publication/slot/subscription names.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

vignesh C

Date:

22 March 2024, 14:12:00

On Fri, 22 Mar 2024 at 09:01, Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

Few comments:
1) In error case PQclear and PQfinish should be called:
+       /* Secure search_path */
+       res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not clear search_path: %s",
+                                        PQresultErrorMessage(res));
+               if (exit_on_error)
+                       exit(1);
+
+               return NULL;
+       }
+       PQclear(res);

2) Call fflush here before calling system command to get proper
ordered console output:
a) Call fflush:
+               int                     rc = system(cmd_str);
+
+               if (rc == 0)
+                       pg_log_info("subscriber successfully changed
the system identifier");
+               else
+                       pg_fatal("subscriber failed to change system
identifier: exit code: %d", rc);
+       }

b) similarly here:
+       pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data);
+       rc = system(pg_ctl_cmd->data);
+       pg_ctl_status(pg_ctl_cmd->data, rc);
+       standby_running = true;

c) similarly here:
+       pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
+                                                 datadir);
+       pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd);
+       rc = system(pg_ctl_cmd);
+       pg_ctl_status(pg_ctl_cmd, rc);
+       standby_running = false;

3) Call PQClear in error cases:
a) Call PQClear
+       res = PQexec(conn, "SELECT system_identifier FROM
pg_catalog.pg_control_system()");
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not get system identifier: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }

b) similarly here
+       if (PQntuples(res) != 1)
+       {
+               pg_log_error("could not get system identifier: got %d
rows, expected %d row",
+                                        PQntuples(res), 1);
+               disconnect_database(conn, true);
+       }
+

c) similarly here
+       res = PQexec(conn,
+                                "SELECT oid FROM pg_catalog.pg_database "
+                                "WHERE datname =
pg_catalog.current_database()");
+       if (PQresultStatus(res) != PGRES_TUPLES_OK)
+       {
+               pg_log_error("could not obtain database OID: %s",
+                                        PQresultErrorMessage(res));
+               disconnect_database(conn, true);
+       }
+

d) There are few more places like this.

4) Since we are using a global variable, we might be able to remove
initializing many of them.
+       /* Default settings */
+       subscriber_dir = NULL;
+       opt.config_file = NULL;
+       opt.pub_conninfo_str = NULL;
+       opt.socket_dir = NULL;
+       opt.sub_port = DEFAULT_SUB_PORT;
+       opt.sub_username = NULL;
+       opt.database_names = (SimpleStringList){0};
+       opt.recovery_timeout = 0;

Regards,
Vignesh

Re: speed up a logical replica setup

From

Shubham Khanna

Date:

22 March 2024, 14:15:35

On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.
>
> Based on the discussion [1] about the check functions, Vignesh suggested that it
> should check both server before exiting. v33-0003 implements it. I don't have a
> strong preference; feel free to apply it.
>
>
> [1] https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

I had run valgrind with pg_createsubscriber to see if there were any
issues. Valgrind reported the following issues:
==651272== LEAK SUMMARY:
==651272==    definitely lost: 1,319 bytes in 18 blocks
==651272==    indirectly lost: 1,280 bytes in 2 blocks
==651272==      possibly lost: 44 bytes in 3 blocks
==651272==    still reachable: 3,066 bytes in 22 blocks
==651272==         suppressed: 0 bytes in 0 blocks
==651272==
==651272== For lists of detected and suppressed errors, rerun with: -s
==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
The attached report has the details of the same.

Thanks and Regards:
Shubham Khanna.

Attachment

valgrind.log

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

25 March 2024, 06:01:39

Dear Amit, Euler,

I also want to share my opinion, just in case.

> On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> >
> > If we commit this we might not be able to change the way the option
> > behaves once the customers starts using it. How about removing these
> > options in the first version and adding it in the next version after
> > more discussion.
> >
> >
> > We don't need to redesign this one if we want to add a format string in a next
> > version. A long time ago, pg_dump started to accept pattern for tables without
> > breaking or deprecating the -t option. If you have 100 databases and you don't
> > want to specify the options or use a script to generate it for you, you also
> > have the option to let pg_createsubscriber generate the object names for you.
> > Per my experience, it will be a rare case.
> >
> 
> But, why go with some UI in the first place which we don't think is a
> good one, or at least don't have a broader agreement that it is a good
> one? The problem with self-generated names for users could be that
> they won't be able to make much out of it. If one has to always use
> those internally then probably that would be acceptable. I would
> prefer what Tomas proposed a few emails ago as compared to this one as
> that has fewer options to be provided by users but still, they can
> later identify objects. But surely, we should discuss if you or others
> have better alternatives.

IIUC, added options were inspired by Tomas. And he said the limitation (pub/sub/slot
name cannot be specified) was acceptable for the first version. I agree with him.
(To be honest, I feel that options should be fewer for the first release)

> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.

+1 for the primary use case.

> > Currently dry-run will do the check and might fail on identifying a
> > few failures like after checking subscriber configurations. Then the
> > user will have to correct the configuration and re-run then fix the
> > next set of failures. Whereas the suggest-config will display all the
> > optimal configuration for both the primary and the standby in a single
> > shot. This is not a must in the first version, it can be done as a
> > subsequent enhancement.
> >
> >
> > Do you meant publisher, right? Per order, check_subscriber is done before
> > check_publisher and it checks all settings on the subscriber before exiting. In
> > v30, I changed the way it provides the required settings. In a previous version,
> > it fails when it found a wrong setting; the current version, check all settings
> > from that server before providing a suitable error.
> >
> > pg_createsubscriber: checking settings on publisher
> > pg_createsubscriber: primary has replication slot "physical_slot"
> > pg_createsubscriber: error: publisher requires wal_level >= logical
> > pg_createsubscriber: error: publisher requires 2 replication slots, but only 0
> remain
> > pg_createsubscriber: hint: Consider increasing max_replication_slots to at least
> 3.
> > pg_createsubscriber: error: publisher requires 2 wal sender processes, but only
> 0 remain
> > pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.
> >
> > If you have such an error, you will fix them all and rerun using dry run mode
> > again to verify everything is ok. I don't have a strong preference about it. It
> > can be changed easily (unifying the check functions or providing a return for
> > each of the check functions).
> >
> 
> We can unify the checks but not sure if it is worth it. I am fine
> either way. It would have been better if we provided a way for a user
> to know the tool's requirement rather than letting him know via errors
> but I think it should be okay to extend it later as well.

Both ways are OK, but I prefer to unify checks a bit. The number of working modes
in the same executables should be reduced as much as possible.

Also, I feel the current specification is acceptable. pg_upgrade checks one by
one and exits soon in bad cases. If both old and new clusters have issues, the
first run only reports the old one's issues. After DBA fixes and runs again,
issues on the new cluster are output.

[1]: https://www.postgresql.org/message-id/8d52c226-7e34-44f7-a919-759bf8d81541%40enterprisedb.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

25 March 2024, 09:16:42

> IIUC, added options were inspired by Tomas. And he said the limitation
> (pub/sub/slot
> name cannot be specified) was acceptable for the first version. I agree with him.
> (To be honest, I feel that options should be fewer for the first release)

Just to confirm - I don't think it is not completely needed. If we can agree a specification
in sometime, it's OK for me to add them. If you ask me, I still prefer Tomas's approach.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

25 March 2024, 11:48:05

Dear Shubham,

> I had run valgrind with pg_createsubscriber to see if there were any
> issues.

Thanks for running the tool!

> Valgrind reported the following issues:
> ==651272== LEAK SUMMARY:
> ==651272==    definitely lost: 1,319 bytes in 18 blocks
> ==651272==    indirectly lost: 1,280 bytes in 2 blocks
> ==651272==      possibly lost: 44 bytes in 3 blocks
> ==651272==    still reachable: 3,066 bytes in 22 blocks
> ==651272==         suppressed: 0 bytes in 0 blocks
> ==651272==
> ==651272== For lists of detected and suppressed errors, rerun with: -s
> ==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from
> 0)
> The attached report has the details of the same.

I read the report. I'm not sure all entries must be fixed. Based on other client
tools, old discussions [1], and current codes, I thought we could determine the below
rule:

* For global variables (and their attributes), no need to free the allocated memory.
* For local variables (and their attributes) in main(), no need to free the allocated memory.
* For local variables in other functions, they should be free'd at the end of the function.

Per above rule and your report, I made a top-up patch which adds pg_free() and
destroyPQExpBuffer() several places. How do you think?

[1]: https://www.postgresql.org/message-id/40595e73-c7e1-463a-b8be-49792e870007%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

On Mon, Mar 25, 2024, at 8:55 AM, Peter Eisentraut wrote:

On 22.03.24 04:31, Euler Taveira wrote:
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>> There is a compilation error while building postgres with the patch
>> due to a recent commit. I have attached a top-up patch v32-0003 to
>> resolve this compilation error.
>> I have not updated the version of the patch as I have not made any
>> change in v32-0001 and v32-0002 patch.
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003)
> into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

I have committed your version v33. I did another pass over the
identifier and literal quoting. I added quoting for replication slot
names, for example, even though they can only contain a restricted set
of characters, but it felt better to be defensive there.

Thanks.

I'm happy to entertain follow-up patches on some of the details like
option naming that were still being discussed. I just wanted to get the
main functionality in in good time. We can fine-tune the rest over the
next few weeks.

Agree. Let's continue the discussion about the details.

> Based on the discussion [1] about the check functions, Vignesh suggested
> that it
> should check both server before exiting. v33-0003 implements it. I don't
> have a
> strong preference; feel free to apply it.

I haven't done anything about this.

... including this one.

Euler Taveira

EDB https://www.enterprisedb.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

25 March 2024, 16:06:46

Dear Bharath, Peter,

> Looks like BF animals aren't happy, please check -
> > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> 
> Looks like sanitizer failures.  There were a few messages about that
> recently, but those were all just about freeing memory after use, which
> we don't necessarily require for client programs.  So maybe something else.

It seems that there are several time of failures, [1] and [2].

## Analysis for failure 1

The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
returns true.

According to the output [1], it seems that the tool failed at wait_for_end_recovery()
with the message "standby server disconnected from the primary". Also, lines
"redo done at..." and "terminating walreceiver process due to administrator command"
meant that walreceiver was requested to shut down by XLogShutdownWalRcv().

According to the source, we confirm that walreceiver is shut down in
StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
at the latter part of StartupXLOG().

So, if there is a delay between FinishWalRecovery() and change the state, the check
in wait_for_end_recovery() would be failed during the time. Since we allow to miss
the walreceiver 10 times and it is checked once per second, the failure occurs if
the time lag is longer than 10 seconds.

I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
but it's not a fundamental solution.

## Analysis for failure 2

According to [2], the physical replication slot which is specified as primary_slot_name
was not used by the walsender process. At that time walsender has not existed.

```
...
pg_createsubscriber: publisher: current wal senders: 0
pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
...
```

Currently standby must be stopped before the command and current code does not
block the flow to ensure the replication is started. So there is a possibility
that the checking is run before walsender is launched.

One possible approach is to wait until the replication starts. Alternative one is
to ease the condition.

How do you think?

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-25%2013%3A03%3A07
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-03-25%2013%3A53%3A58

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: speed up a logical replica setup

From

vignesh C

Date:

26 March 2024, 00:56:44

On Mon, 25 Mar 2024 at 21:36, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Bharath, Peter,
>
> > Looks like BF animals aren't happy, please check -
> > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> >
> > Looks like sanitizer failures.  There were a few messages about that
> > recently, but those were all just about freeing memory after use, which
> > we don't necessarily require for client programs.  So maybe something else.
>
> It seems that there are several time of failures, [1] and [2].
>
> ## Analysis for failure 1
>
> The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
> returns true.
>
> According to the output [1], it seems that the tool failed at wait_for_end_recovery()
> with the message "standby server disconnected from the primary". Also, lines
> "redo done at..." and "terminating walreceiver process due to administrator command"
> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>
> According to the source, we confirm that walreceiver is shut down in
> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
> at the latter part of StartupXLOG().
>
> So, if there is a delay between FinishWalRecovery() and change the state, the check
> in wait_for_end_recovery() would be failed during the time. Since we allow to miss
> the walreceiver 10 times and it is checked once per second, the failure occurs if
> the time lag is longer than 10 seconds.
>
> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
> but it's not a fundamental solution.

I agree with your analysis, another way to fix could be to remove the
following check as increasing the count might still have the race
condition issue:
/*
* If it is still in recovery, make sure the target server is
* connected to the primary so it can receive the required WAL to
* finish the recovery process. If it is disconnected try
* NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
*/
res = PQexec(conn,
"SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");

I'm not sure whether we should worry about the condition where
recovery is not done and pg_stat_wal_receiver is exited as we have the
following sanity check in check_subscriber before we wait for recovery
to be finished:
/* The target server must be a standby */
if (!server_is_in_recovery(conn))
{
pg_log_error("target server must be a standby");
disconnect_database(conn, true);
}

Regards,
Vignesh

Re: speed up a logical replica setup

From

Amit Kapila

Date:

26 March 2024, 02:33:26

On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have committed your version v33.  I did another pass over the
> identifier and literal quoting.  I added quoting for replication slot
> names, for example, even though they can only contain a restricted set
> of characters, but it felt better to be defensive there.
>
> I'm happy to entertain follow-up patches on some of the details like
> option naming that were still being discussed.  I just wanted to get the
> main functionality in in good time.  We can fine-tune the rest over the
> next few weeks.
>

I was looking at prior discussions on this topic to see if there are
any other open design points apart from this and noticed that the
points raised/discussed in the email [1] are also not addressed. IIRC,
the key point we discussed was that after promotion, the existing
replication objects should be removed (either optionally or always),
otherwise, it can lead to a new subscriber not being able to restart
or getting some unwarranted data.

[1] - https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 March 2024, 02:53:21

On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:

## Analysis for failure 1

The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
returns true.

According to the output [1], it seems that the tool failed at wait_for_end_recovery()
with the message "standby server disconnected from the primary". Also, lines
"redo done at..." and "terminating walreceiver process due to administrator command"
meant that walreceiver was requested to shut down by XLogShutdownWalRcv().

According to the source, we confirm that walreceiver is shut down in
StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
at the latter part of StartupXLOG().

So, if there is a delay between FinishWalRecovery() and change the state, the check
in wait_for_end_recovery() would be failed during the time. Since we allow to miss
the walreceiver 10 times and it is checked once per second, the failure occurs if
the time lag is longer than 10 seconds.

I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
but it's not a fundamental solution.

I was expecting that slow hosts might have issues in wait_for_end_recovery().

As you said it took a lot of steps between FinishWalRecovery() (where

walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to

RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *

WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision

because it already finished the promotion and it is just doing the final

preparation for the host to become a primary.

* If it is still in recovery, make sure the target server is

* connected to the primary so it can receive the required WAL to

* finish the recovery process. If it is disconnected try

* NUM_CONN_ATTEMPTS in a row and bail out if not succeed.

res = PQexec(conn,

"SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");

if (PQntuples(res) == 0)

{

if (++count > NUM_CONN_ATTEMPTS)

{

stop_standby_server(subscriber_dir);

pg_log_error("standby server disconnected from the primary");

break;

}

else

count = 0; /* reset counter if it connects again */

This code was add to defend against the death/crash of the target server. There

are at least 3 options:

(1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this constant

and I decided to use 10 seconds because even in some slow hosts, this time

wasn't reached during my tests. It seems I forgot to test the combination of slow

host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60

seconds as default wait. Maybe that's a reasonable value. I checked the

004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s

(culicidae), 18.31s (calliphoridae) and 27.52s (olingo).

(2) check if the primary is not running when walreceiver is not available on the

target server. Increase the connection attempts iif the primary is not running.

Hence, the described case doesn't cause an increment on the count variable.

(3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check

protection against the death/crash target server. I explained in a previous

message that timeout may occur in cases that WAL replay to reach consistent

state takes more than recovery-timeout seconds.

Option (1) is the easiest fix, however, we can have the same issue again if a

slow host decides to be even slower, hence, we have to adjust this value again.

Option (2) interprets the walreceiver absence as a recovery end and if the

primary server is running it can indicate that the target server is in the

imminence of the recovery end. Option (3) is not as resilient as the other

options.

The first patch implements a combination of (1) and (2).

## Analysis for failure 2

According to [2], the physical replication slot which is specified as primary_slot_name
was not used by the walsender process. At that time walsender has not existed.

```
...
pg_createsubscriber: publisher: current wal senders: 0
pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
...
```

Currently standby must be stopped before the command and current code does not
block the flow to ensure the replication is started. So there is a possibility
that the checking is run before walsender is launched.

One possible approach is to wait until the replication starts. Alternative one is
to ease the condition.

That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to

NUM_ATTEMPTS in the first patch). See second patch.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 March 2024, 02:57:27

On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:

On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have committed your version v33. I did another pass over the
> identifier and literal quoting. I added quoting for replication slot
> names, for example, even though they can only contain a restricted set
> of characters, but it felt better to be defensive there.
>
> I'm happy to entertain follow-up patches on some of the details like
> option naming that were still being discussed. I just wanted to get the
> main functionality in in good time. We can fine-tune the rest over the
> next few weeks.
>

I was looking at prior discussions on this topic to see if there are
any other open design points apart from this and noticed that the
points raised/discussed in the email [1] are also not addressed. IIRC,
the key point we discussed was that after promotion, the existing
replication objects should be removed (either optionally or always),
otherwise, it can lead to a new subscriber not being able to restart
or getting some unwarranted data.

See setup_subscriber.

* Since the publication was created before the consistent LSN, it is

* available on the subscriber when the physical replica is promoted.

* Remove publications from the subscriber because it has no use.

drop_publication(conn, &dbinfo[i]);

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

26 March 2024, 04:06:09

On Tue, Mar 26, 2024 at 8:27 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
>
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > I have committed your version v33.  I did another pass over the
> > identifier and literal quoting.  I added quoting for replication slot
> > names, for example, even though they can only contain a restricted set
> > of characters, but it felt better to be defensive there.
> >
> > I'm happy to entertain follow-up patches on some of the details like
> > option naming that were still being discussed.  I just wanted to get the
> > main functionality in in good time.  We can fine-tune the rest over the
> > next few weeks.
> >
>
> I was looking at prior discussions on this topic to see if there are
> any other open design points apart from this and noticed that the
> points raised/discussed in the email [1] are also not addressed. IIRC,
> the key point we discussed was that after promotion, the existing
> replication objects should be removed (either optionally or always),
> otherwise, it can lead to a new subscriber not being able to restart
> or getting some unwarranted data.
>
>
> See setup_subscriber.
>
>         /*
>          * Since the publication was created before the consistent LSN, it is
>          * available on the subscriber when the physical replica is promoted.
>          * Remove publications from the subscriber because it has no use.
>          */
>         drop_publication(conn, &dbinfo[I]);
>

This only drops the publications created by this tool, not the
pre-existing ones that we discussed in the link provided.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 March 2024, 04:26:43

Dear Amit, Euler,

> 
> This only drops the publications created by this tool, not the
> pre-existing ones that we discussed in the link provided.

Another concern around here is the case which primary subscribes changes from others.
After the conversion, new subscriber also tries to connect to another publisher as
well - this may lead conflicts. This causes because both launcher/workers start
after recovery finishes. So, based on the Ashutosh's point, should we remove
such replication objects?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: speed up a logical replica setup

From

Tomas Vondra

Date:

26 March 2024, 19:12:59

On 3/26/24 03:53, Euler Taveira wrote:
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>> ## Analysis for failure 1
>>
>> The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery()
>> returns true.
>>
>> According to the output [1], it seems that the tool failed at wait_for_end_recovery()
>> with the message "standby server disconnected from the primary". Also, lines
>> "redo done at..." and "terminating walreceiver process due to administrator command"
>> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>>
>> According to the source, we confirm that walreceiver is shut down in
>> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState
>> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true)
>> at the latter part of StartupXLOG().
>>
>> So, if there is a delay between FinishWalRecovery() and change the state, the check
>> in wait_for_end_recovery() would be failed during the time. Since we allow to miss
>> the walreceiver 10 times and it is checked once per second, the failure occurs if
>> the time lag is longer than 10 seconds.
>>
>> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger,
>> but it's not a fundamental solution.
> 
> I was expecting that slow hosts might have issues in wait_for_end_recovery().
> As you said it took a lot of steps between FinishWalRecovery() (where
> walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to
> RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *
> WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision
> because it already finished the promotion and it is just doing the final
> preparation for the host to become a primary.
> 
>         /*   
>          * If it is still in recovery, make sure the target server is
>          * connected to the primary so it can receive the required WAL to
>          * finish the recovery process. If it is disconnected try
>          * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
>          */
>         res = PQexec(conn,
>                      "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
>         if (PQntuples(res) == 0)
>         {
>             if (++count > NUM_CONN_ATTEMPTS)
>             {
>                 stop_standby_server(subscriber_dir);
>                 pg_log_error("standby server disconnected from the primary");
>                 break;
>             }    
>         }
>         else
>             count = 0;          /* reset counter if it connects again */
> 
> This code was add to defend against the death/crash of the target server. There
> are at least 3 options:
> 
> (1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this constant
> and I decided to use 10 seconds because even in some slow hosts, this time
> wasn't reached during my tests. It seems I forgot to test the combination of slow
> host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60
> seconds as default wait. Maybe that's a reasonable value. I checked the
> 004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s
> (culicidae), 18.31s (calliphoridae) and 27.52s (olingo).
> > (2) check if the primary is not running when walreceiver is not
available on the
> target server. Increase the connection attempts iif the primary is not running.
> Hence, the described case doesn't cause an increment on the count variable.
> 
> (3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check
> protection against the death/crash target server. I explained in a previous
> message that timeout may occur in cases that WAL replay to reach consistent
> state takes more than recovery-timeout seconds.
> 
> Option (1) is the easiest fix, however, we can have the same issue again if a
> slow host decides to be even slower, hence, we have to adjust this value again.
> Option (2) interprets the walreceiver absence as a recovery end and if the
> primary server is running it can indicate that the target server is in the
> imminence of the recovery end. Option (3) is not as resilient as the other
> options.
> 
> The first patch implements a combination of (1) and (2).
> 
>> ## Analysis for failure 2
>>
>> According to [2], the physical replication slot which is specified as primary_slot_name
>> was not used by the walsender process. At that time walsender has not existed.
>>
>> ```
>> ...
>> pg_createsubscriber: publisher: current wal senders: 0
>> pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
>> pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
>> ...
>> ```
>>
>> Currently standby must be stopped before the command and current code does not
>> block the flow to ensure the replication is started. So there is a possibility
>> that the checking is run before walsender is launched.
>>
>> One possible approach is to wait until the replication starts. Alternative one is
>> to ease the condition.
> 
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
> 

Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
waited long enough?

IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
pg_createsubscriber, and that should do the trick.

Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
things like ubsan/valgrind already use to deal with exactly this kind of
timeout problem.

Or is there a deeper problem with deciding if the system is in recovery?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 March 2024, 20:17:15

On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:

Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
waited long enough?

It was an attempt to decoupled a connection failure (that keeps streaming the

WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary

is gone during the standby recovery process, there is a way to bail out. The

recovery-timeout is 0 (infinite) by default so you have an infinite wait without

this check. The idea behind this implementation is to avoid exiting in this

critical code path. If it times out here you might have to rebuild the standby

and start again. Amit suggested [1] that we use a value as recovery-timeout but

how high is a good value? I've already saw some long recovery process using

pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too

worried about a small percentage of cases and we should use 1h as default, for

example. It would reduce the complexity since the recovery process lacks some

progress indicators (LSN is not sufficient in this case and there isn't a

function to provide the current state -- stop applying WAL, reach target, new

timeline, etc).

If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery

by default otherwise we will have some reports saying the tool is hanging when

in reality the primary has gone and WAL should be streamed.

IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
pg_createsubscriber, and that should do the trick.

That's a good idea. Tests are not exercising the recovery-timeout option.

Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
things like ubsan/valgrind already use to deal with exactly this kind of
timeout problem.

Or is there a deeper problem with deciding if the system is in recovery?

As I said with some recovery progress indicators it would be easier to make some

decisions like wait a few seconds because the WAL has already been applied and

it is creating a new timeline. The recovery timeout decision is a shot in the

dark because we might be aborting pg_createsubscriber when the target server is

about to set RECOVERY_STATE_DONE.

[1] https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Tomas Vondra

Date:

26 March 2024, 21:29:19

On 3/26/24 21:17, Euler Taveira wrote:
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
>> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
>> waited long enough?
> 
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out. The
> recovery-timeout is 0 (infinite) by default so you have an infinite wait without
> this check. The idea behind this implementation is to avoid exiting in this
> critical code path. If it times out here you might have to rebuild the standby
> and start again.

- This seems like something that should definitely be documented in the
comment before wait_for_end_recovery(). At the moment it only talks
about timeout, and nothing about NUM_CONN_ATTEMPTS.

- The NUM_CONN_ATTEMPTS name seems rather misleading, considering it
does not really count connection attempts, but number of times we have
not seen 1 in pg_catalog.pg_stat_wal_receiver.

- Not sure I follow the logic - it tries to avoid exiting by setting
infinite timeout, but it still exists based on NUM_CONN_ATTEMPTS. Isn't
that somewhat contradictory?

- Isn't the NUM_CONN_ATTEMPTS actually making it more fragile, i.e. more
likely to exit? For example, what if there's a short networking hiccup,
so that the standby can't connect to the primary.

- It seems a bit strange that even with the recovery timeout set, having
the limit of 10 "connection attempts" effectively establishes a separate
hard-coded limit of 10 seconds. Seems a bit surprising if I set recovery
limit to 1 minute, and it just dies after 10 seconds.

> Amit suggested [1] that we use a value as recovery-timeout but
> how high is a good value? I've already saw some long recovery process using
> pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
> worried about a small percentage of cases and we should use 1h as default, for
> example. It would reduce the complexity since the recovery process lacks some
> progress indicators (LSN is not sufficient in this case and there isn't a
> function to provide the current state -- stop applying WAL, reach target, new
> timeline, etc).
> 
> If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
> by default otherwise we will have some reports saying the tool is hanging when
> in reality the primary has gone and WAL should be streamed.
> 

I don't think there's a default timeout value that would work for
everyone. Either it's going to be too short for some cases, or it'll
take too long for some other cases.

I think there are two obvious default values for the timeout - infinity,
and 60 seconds, which is the default we use for other CLI tools (like
pg_ctl and so on). Considering the negative impact of exiting, I'd say
it's better to default to infinity. It's always possible to Ctrl-C or
terminate the process in some other way, if needed.

As for people complaining about infinite recovery - perhaps it'd be
sufficient to mention this in the messages printed by the tool, to make
it clearer. Or maybe even print something in the loop, because right now
it's entirely silent so it's easy to believe it's stuck. Perhaps not on
every loop, but at least in verbose mode it should print something.

>> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
>> pg_createsubscriber, and that should do the trick.
> 
> That's a good idea. Tests are not exercising the recovery-timeout option.
> 
>> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
>> things like ubsan/valgrind already use to deal with exactly this kind of
>> timeout problem.
>>
>> Or is there a deeper problem with deciding if the system is in recovery?
> 
> As I said with some recovery progress indicators it would be easier to make some
> decisions like wait a few seconds because the WAL has already been applied and
> it is creating a new timeline. The recovery timeout decision is a shot in the
> dark because we might be aborting pg_createsubscriber when the target server is
> about to set RECOVERY_STATE_DONE.
> 

Isn't it enough to check data in pg_stat_replication on the primary?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up a logical replica setup

From

Amit Kapila

Date:

29 April 2024, 09:56:48

On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

29 April 2024, 10:58:17

On Tue, Mar 26, 2024 at 8:24 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>
> The first patch implements a combination of (1) and (2).
>
> ## Analysis for failure 2
>
> According to [2], the physical replication slot which is specified as primary_slot_name
> was not used by the walsender process. At that time walsender has not existed.
>
> ```
> ...
> pg_createsubscriber: publisher: current wal senders: 0
> pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name =
'physical_slot'
> pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
> ...
> ```
>
> Currently standby must be stopped before the command and current code does not
> block the flow to ensure the replication is started. So there is a possibility
> that the checking is run before walsender is launched.
>
> One possible approach is to wait until the replication starts. Alternative one is
> to ease the condition.
>
>
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
>

How can we guarantee that the slot can become active after these many
attempts? It still could be possible that on some slow machines it
didn't get activated even after NUM_ATTEMPTS. BTW, in the first place,
why do we need to ensure that the 'primary_slot_name' is active on the
primary?

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

29 April 2024, 11:53:19

On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:

On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

How would you check it? WAL file? During recovery, you are not allowed to use

pg_current_wal_lsn.

Tomas suggested to me off-list that we should adopt a simple solution in

wait_for_end_recovery: wait for recovery_timeout without additional checks

(which means remove the pg_stat_wal_receiver logic). When we have additional

information that we can reliably use in this function, we can add it. Hence, it

is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

29 April 2024, 11:58:10

On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
>
> On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> >
> > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> > Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> > waited long enough?
> >
> >
> > It was an attempt to decoupled a connection failure (that keeps streaming the
> > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
> > is gone during the standby recovery process, there is a way to bail out.
> >
>
> I think we don't need to check primary if the WAL corresponding to
> consistent_lsn is already present on the standby. Shouldn't we first
> check that? Once we ensure that the required WAL is copied, just
> checking server_is_in_recovery() should be sufficient. I feel that
> will be a direct way of ensuring what is required rather than
> indirectly verifying the same (by checking pg_stat_wal_receiver) as we
> are doing currently.
>
>
> How would you check it? WAL file? During recovery, you are not allowed to use
> pg_current_wal_lsn.
>

How about pg_last_wal_receive_lsn()?

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

30 April 2024, 06:34:17

On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
>

I was trying to test this utility when 'sync_replication_slots' is on
and it gets in an ERROR loop [1] and never finishes. Please find the
postgresql.auto used on the standby attached. I think if the standby
has enabled sync_slots, you need to pass dbname in
GenerateRecoveryConfig(). I couldn't test it further but I wonder if
there are already synced slots on the standby (either due to
'sync_replication_slots' or users have used
pg_sync_replication_slots() before invoking pg_createsubscriber),
those would be retained as it is on new subscriber and lead to
unnecessary WAL retention and dead rows.

[1]
2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
requires dbname to be specified in primary_conninfo

--
With Regards,
Amit Kapila.

Attachment

postgresql.auto.standby.conf

Re: speed up a logical replica setup

From

Amit Kapila

Date:

30 April 2024, 08:47:54

On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira <euler@eulerto.com> wrote:
> >
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig().

The other possibility is that when we start standby from
pg_createsubscriber, we specifically set 'sync_replication_slots' as
false.

>
> I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>

This still needs some handling.

BTW, I don't see the use of following messages in --dry-run mode or
rather they could be misleading:
pg_createsubscriber: hint: If pg_createsubscriber fails after this
point, you must recreate the physical replica before continuing.
...
...
pg_createsubscriber: setting the replication progress (node name
"pg_0" ; LSN 0/0) on database "postgres"

Similarly, we should think if below messages are useful in --dry-run mode:
pg_createsubscriber: dropping publication
"pg_createsubscriber_5_887f7991" on database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_887f7991" on database "postgres"
...
pg_createsubscriber: enabling subscription
"pg_createsubscriber_5_887f7991" on database "postgres"

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Thomas Munro

Date:

18 May 2024, 22:55:11

040_pg_createsubscriber.pl seems to be failing occasionally on
culicidae near "--dry-run on node S".  I couldn't immediately see why.
That animal is using EXEC_BACKEND and I also saw a one-off failure a
bit like that on my own local Linux + EXEC_BACKEND test run
(sorry I didn't keep the details around).  Coincidence?

Re: speed up a logical replica setup

From

Amit Kapila

Date:

20 May 2024, 10:28:50

On Sun, May 19, 2024 at 4:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> 040_pg_createsubscriber.pl seems to be failing occasionally on
> culicidae near "--dry-run on node S".  I couldn't immediately see why.
> That animal is using EXEC_BACKEND and I also saw a one-off failure a
> bit like that on my own local Linux + EXEC_BACKEND test run
> (sorry I didn't keep the details around).  Coincidence?
>

AFAICS, this is the same as one of the two BF failures being discussed
in this thread.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

20 May 2024, 10:59:50

Hi,
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>
> [1]
> 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> requires dbname to be specified in primary_conninfo

Hi,

I tested the scenario posted by Amit in [1], in which retaining synced
slots lead to unnecessary WAL retention and ERROR. This is raised as
the second open point in [2].
The steps to reproduce the issue:
(1) Setup physical replication with sync slot feature turned on by
setting sync_replication_slots = 'true' or using
pg_sync_replication_slots() on the standby node.
For physical replication setup, run pg_basebackup with -R  and -d option.
(2) Create a logical replication slot on primary node with failover
option as true. A corresponding slot is created on standby as part of
sync slot feature.
(3) Run pg_createsubscriber on standby node.
(4) On Checking for the replication slot on standby node, I noticed
that the logical slots created in step 2 are retained.
 I have attached the script to reproduce the issue.

I and Kuroda-san worked to resolve open points. Here are patches to
solve the second and third point in [2].
Patches proposed by Euler are also attached just in case, but they
were not modified.

v2-0001: not changed
v2-0002: not changed
v2-0003: ensures the slot sync is disabled during the conversion. This
resolves the second point.
v2-0004: drops sync slots which may be retained after running. This
resolves the second point.
v2-0005: removes misleading output messages in dry-run. This resolves
the third point.

[1]: https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1J22UEfrqx222h5j9DQ7nxGrTbAa_BC%2B%3DmQXdXs-RCsew%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From

Amit Kapila

Date:

22 May 2024, 11:19:52

On Mon, May 20, 2024 at 4:30 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > I was trying to test this utility when 'sync_replication_slots' is on
> > and it gets in an ERROR loop [1] and never finishes. Please find the
> > postgresql.auto used on the standby attached. I think if the standby
> > has enabled sync_slots, you need to pass dbname in
> > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > there are already synced slots on the standby (either due to
> > 'sync_replication_slots' or users have used
> > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > those would be retained as it is on new subscriber and lead to
> > unnecessary WAL retention and dead rows.
> >
> > [1]
> > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > requires dbname to be specified in primary_conninfo
>
> Hi,
>
> I tested the scenario posted by Amit in [1], in which retaining synced
> slots lead to unnecessary WAL retention and ERROR. This is raised as
> the second open point in [2].
> The steps to reproduce the issue:
> (1) Setup physical replication with sync slot feature turned on by
> setting sync_replication_slots = 'true' or using
> pg_sync_replication_slots() on the standby node.
> For physical replication setup, run pg_basebackup with -R  and -d option.
> (2) Create a logical replication slot on primary node with failover
> option as true. A corresponding slot is created on standby as part of
> sync slot feature.
> (3) Run pg_createsubscriber on standby node.
> (4) On Checking for the replication slot on standby node, I noticed
> that the logical slots created in step 2 are retained.
>  I have attached the script to reproduce the issue.
>
> I and Kuroda-san worked to resolve open points. Here are patches to
> solve the second and third point in [2].
> Patches proposed by Euler are also attached just in case, but they
> were not modified.
>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

[1] - https://www.postgresql.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

22 May 2024, 15:16:24

On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:

>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is

better when we don't have a useful recovery progress reporting mechanism to make

sure it is still working on the recovery and we should wait.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in

check_subscriber() that waits until walreceiver is available on subscriber or

(b) use a timeout. The main advantage of (a) is that the primary slot is already

available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I

think we should avoid new development if we judge that the item can be

documented as a limitation for this version. Nevertheless, I will share patches

so you can give your opinion on whether it is an open item or new development.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

23 May 2024, 08:54:25

On Wed, May 22, 2024 at 8:46 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
>
> > v2-0002: not changed
> >
>
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.
>
>
> Following the same line that simplifies the code, we can: (a) add a loop in
> check_subscriber() that waits until walreceiver is available on subscriber or
> (b) use a timeout. The main advantage of (a) is that the primary slot is already
> available but I'm afraid we need a escape mechanism for the loop (timeout?).
>

Sorry, it is not clear to me why we need any additional loop in
check_subscriber(), aren't we speaking about the problem in
check_publisher() function?

Why in the first place do we need to ensure that primary_slot_name is
active on the primary? You mentioned something related to WAL
retention but I don't know how that is related to this tool's
functionality. If at all, we are bothered about WAL retention on the
primary that should be the WAL corresponding to consistent_lsn
computed by setup_publisher() but this check doesn't seem to ensure
that.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

23 May 2024, 15:13:07

On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:

On Wed, May 22, 2024 at 8:46 PM Euler Taveira <euler@eulerto.com> wrote:
>
> Following the same line that simplifies the code, we can: (a) add a loop in
> check_subscriber() that waits until walreceiver is available on subscriber or
> (b) use a timeout. The main advantage of (a) is that the primary slot is already
> available but I'm afraid we need a escape mechanism for the loop (timeout?).
>

Sorry, it is not clear to me why we need any additional loop in
check_subscriber(), aren't we speaking about the problem in
check_publisher() function?

The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this

view returns a row and primary_slot_name is set on standby, the referred

replication slot name should be active on primary. Hence, the query on

check_publisher() make sure that the referred replication slot is in use on

primary.

Why in the first place do we need to ensure that primary_slot_name is
active on the primary? You mentioned something related to WAL
retention but I don't know how that is related to this tool's
functionality. If at all, we are bothered about WAL retention on the
primary that should be the WAL corresponding to consistent_lsn
computed by setup_publisher() but this check doesn't seem to ensure
that.

Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and

make sure the replication slot is used by the physical replication. I mean if

there is primary_slot_name = 'foo' on standby, there is no guarantee that the

replication slot 'foo' exists on primary. The idea is to get the exact

replication slot name used by physical replication to drop it. Once I posted a

patch it should be clear. (Another idea is to relax this check and rely only on

primary_slot_name to drop this replication slot on primary. The replication slot

might not exist and it shouldn't return an error in this case.)

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

24 May 2024, 03:24:03

On Thu, May 23, 2024 at 8:43 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
>
>
> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.
>
> Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
> make sure the replication slot is used by the physical replication. I mean if
> there is primary_slot_name = 'foo' on standby, there is no guarantee that the
> replication slot 'foo' exists on primary. The idea is to get the exact
> replication slot name used by physical replication to drop it. Once I posted a
> patch it should be clear. (Another idea is to relax this check and rely only on
> primary_slot_name to drop this replication slot on primary. The replication slot
> might not exist and it shouldn't return an error in this case.)
>

I think your other idea is better than what we are doing currently.
Let's ignore the ERROR even if the primary_slot_name doesn't exist on
the primary.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Shlok Kyal

Date:

24 May 2024, 09:37:05

On Wed, 22 May 2024 at 16:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 20, 2024 at 4:30 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > I was trying to test this utility when 'sync_replication_slots' is on
> > > and it gets in an ERROR loop [1] and never finishes. Please find the
> > > postgresql.auto used on the standby attached. I think if the standby
> > > has enabled sync_slots, you need to pass dbname in
> > > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > > there are already synced slots on the standby (either due to
> > > 'sync_replication_slots' or users have used
> > > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > > those would be retained as it is on new subscriber and lead to
> > > unnecessary WAL retention and dead rows.
> > >
> > > [1]
> > > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > > requires dbname to be specified in primary_conninfo
> >
> > Hi,
> >
> > I tested the scenario posted by Amit in [1], in which retaining synced
> > slots lead to unnecessary WAL retention and ERROR. This is raised as
> > the second open point in [2].
> > The steps to reproduce the issue:
> > (1) Setup physical replication with sync slot feature turned on by
> > setting sync_replication_slots = 'true' or using
> > pg_sync_replication_slots() on the standby node.
> > For physical replication setup, run pg_basebackup with -R  and -d option.
> > (2) Create a logical replication slot on primary node with failover
> > option as true. A corresponding slot is created on standby as part of
> > sync slot feature.
> > (3) Run pg_createsubscriber on standby node.
> > (4) On Checking for the replication slot on standby node, I noticed
> > that the logical slots created in step 2 are retained.
> >  I have attached the script to reproduce the issue.
> >
> > I and Kuroda-san worked to resolve open points. Here are patches to
> > solve the second and third point in [2].
> > Patches proposed by Euler are also attached just in case, but they
> > were not modified.
> >
> > v2-0001: not changed
> >
>
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.
>

I have made changes as per your suggestion. I have used
pg_last_wal_receive_lsn to get lsn last wal received on standby and
check if consistent_lsn is already present on the standby.
I have only made changes in v3-0001. v3-0002, v3-0003, v3-0004 and
v3-0005 are not modified.

Thanks and Regards,
Shlok Kyal

Attachment

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

07 June 2024, 03:49:25

On Wed, May 22, 2024, at 12:16 PM, Euler Taveira wrote:

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.

Here it is a patch series to fix the issues reported in recent discussions. The

patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes

synchronized failover slots on subscriber since it has no use. I also included

an optional patch 0004 that improves the usability by checking both servers if

it already failed in any subscriber check.

As I said in this previous email I decided to remove the logic that reacts for

an issue on primary. We can reintroduce another code later if/when we have a

better way to check the recovery progress. It will rely on the recovery_timeout

and it adds recovery_timeout equals to PG_TEST_TIMEOUT_DEFAULT to let the

animals control how long it should wait for the recovery.

Since some animals reported some issues in the check_publisher routine that

checks if the primary_slot_name is in use on primary, this logic was removed

too (patch 0003). We could introduce a way to keep trying this test but the

conclusion is that it is not worth it and if the primary_slot_name does not

exist (due to a setup error), pg_createsubscriber will log an error message and

continue.

The 0002 removes any failover slots that remains on subscriber. Talking about

terminology, I noticed that slotsync.c uses "logical failover slots" and

"failover logical slots", I think the latter sounds better but I'm not a native

speaker. I also don't know if we use a short terminology like "failover slots"

"failover replication slots" or "failover logical replication slots". IMO we

can omit "logical" because "failover" infers it is a logical replication slot.

I'm also not sure about omitting "replication". It is not descriptive enough. I

prefer "failover replication slots".

Before sending this email I realized that I did nothing about physical

replication slots on the standby. I think we should also remove them too

unconditionally.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

07 June 2024, 09:17:59

Dear Euler,

Thanks for making the follow-up patch! I was looking forward to your updates.
I think this patch set is the solution for the found buildfarm error. However,
there are remained claims raised by others. You should reply what you think for
them. At least:

1) There are some misleading messages [1]. I think v3-0005 patch set can solve
the issue.
2) pg_createsubscriber may fail If the primary has subscriptions [2]. IIUC
possible approaches are A)"keep subscriptions disabled at the end",
B)"by default drop the pre-existing subscriptions",
C) "do nothing, just document the risk".

> Before sending this email I realized that I did nothing about physical
> replication slots on the standby. I think we should also remove them too
> unconditionally.

I also considered around here, but it might be difficult to predict the
expectation by users. Can we surely say that it's not intentional one? Regarding
the failover slot, it is OK because that's meaningful only on the standby,
but not sure other slots. I personally think we can keep current spec, but
how do other think?

Below parts are comments for each patches.

0001
Basically LGTM. I was bit confused because the default timeout is not set, but
it seemed to follow the suggestion by Tomas [3].

0002
If you want to improve the commit message, please add that sync_replication_slots
is disabled during the conversion.

0003
Confirmed it followed the discussion [4].

0004
Basically LGTM.

Other minor comments are included by the attached diff file. It contains changes
to follow conventions and pgindent/pgperltidy.

[1]: https://www.postgresql.org/message-id/CAA4eK1J2fAvsJ2HihbWJ_GxETd6sdqSMrZdCVJEutRZRpm1MEQ%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/CANhcyEWvimA1-f6hSrA%3D9qkfR5SonFb56b36M%2B%2BvT%3DLiFj%3D76g%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/5d5dd4cd-6359-4109-88e8-c8e13035ae16%40enterprisedb.com
[4]: https://www.postgresql.org/message-id/CAA4eK1LZxYxcbeiOn3Q5hjXVtZKhJWj-fQtndAeTCvZrPev8BA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment

minor_fix_by_kuroda.diff

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

17 June 2024, 11:04:52

On 07.06.24 05:49, Euler Taveira wrote:
> Here it is a patch series to fix the issues reported in recent 
> discussions. The
> patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 
> removes
> synchronized failover slots on subscriber since it has no use. I also 
> included
> an optional patch 0004 that improves the usability by checking both 
> servers if
> it already failed in any subscriber check.

I have committed 0001, 0002, and 0003.  Let's keep an eye on the 
buildfarm to see if that stabilizes things.  So far it looks good.

For 0004, I suggest inverting the result values from check_publisher() 
and create_subscriber() so that it returns true if the check is ok.

Re: speed up a logical replica setup

From

Peter Eisentraut

Date:

17 June 2024, 11:08:43

On 07.06.24 11:17, Hayato Kuroda (Fujitsu) wrote:
> Other minor comments are included by the attached diff file. It contains changes
> to follow conventions and pgindent/pgperltidy.

I have included some of your changes in the patches from Euler that I 
committed today.  The 0004 patch might get rerolled by Euler, so he 
could include your relevant changes there.

I'm not sure about moving the sync_replication_slots setting around.  It 
seems to add a lot of work for no discernible gain?

The pgperltidy changes are probably better done separately, because they 
otherwise make quite a mess of the patch.  Maybe we'll just do that once 
more before we branch.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

18 June 2024, 07:23:52

On Mon, Jun 17, 2024, at 8:04 AM, Peter Eisentraut wrote:

On 07.06.24 05:49, Euler Taveira wrote:
> Here it is a patch series to fix the issues reported in recent
> discussions. The
> patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002
> removes
> synchronized failover slots on subscriber since it has no use. I also
> included
> an optional patch 0004 that improves the usability by checking both
> servers if
> it already failed in any subscriber check.

I have committed 0001, 0002, and 0003. Let's keep an eye on the
buildfarm to see if that stabilizes things. So far it looks good.

Thanks.

For 0004, I suggest inverting the result values from check_publisher()
and create_subscriber() so that it returns true if the check is ok.

Yeah, the order doesn't matter after the commit b963913826. I thought about

changing the order but I didn't; I provided a minimal patch. Since you think

it is an improvement, I'm attaching another patch with this change.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

0001-Check-both-servers-before-exiting.patch

Re: speed up a logical replica setup

From

Noah Misch

Date:

23 June 2024, 06:21:57

On Mon, Mar 25, 2024 at 12:55:39PM +0100, Peter Eisentraut wrote:
> I have committed your version v33.

> commit d44032d

> --- /dev/null
> +++ b/src/bin/pg_basebackup/pg_createsubscriber.c

> +static char *
> +concat_conninfo_dbname(const char *conninfo, const char *dbname)
> +{
> +    PQExpBuffer buf = createPQExpBuffer();
> +    char       *ret;
> +
> +    Assert(conninfo != NULL);
> +
> +    appendPQExpBufferStr(buf, conninfo);
> +    appendPQExpBuffer(buf, " dbname=%s", dbname);

pg_createsubscriber fails on a dbname containing a space.  Use
appendConnStrVal() here and for other params in get_sub_conninfo().  See the
CVE-2016-5424 commits for more background.  For one way to test this scenario,
see generate_db() in the pg_upgrade test suite.

> +static char *
> +create_logical_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo)
> +{
> +    PQExpBuffer str = createPQExpBuffer();
> +    PGresult   *res = NULL;
> +    const char *slot_name = dbinfo->replslotname;
> +    char       *slot_name_esc;
> +    char       *lsn = NULL;
> +
> +    Assert(conn != NULL);
> +
> +    pg_log_info("creating the replication slot \"%s\" on database \"%s\"",
> +                slot_name, dbinfo->dbname);
> +
> +    slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name));
> +
> +    appendPQExpBuffer(str,
> +                      "SELECT lsn FROM pg_catalog.pg_create_logical_replication_slot(%s, 'pgoutput', false, false,
false)",

This is passing twophase=false, but the patch does not mention prepared
transactions.  Is the intent to not support workloads containing prepared
transactions?  If so, the documentation should say that, and the tool likely
should warn on startup if max_prepared_transactions != 0.

> +static void
> +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> +{

> +    appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> +                      ipubname_esc);

This tool's documentation says it "guarantees that no transaction will be
lost."  I tried to determine whether achieving that will require something
like the fix from
https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
(Not exactly the fix from that thread, since that thread has not discussed the
FOR ALL TABLES version of its race condition.)  I don't know.  On the one
hand, pg_createsubscriber benefits from creating a logical slot after creating
the publication.  That snapbuild.c process will wait for running XIDs.  On the
other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
its relcache entry before assigning an XID, so perhaps the snapbuild.c process
isn't enough to prevent that thread's race condition.  What do you think?

Re: speed up a logical replica setup

From

Amit Kapila

Date:

24 June 2024, 06:38:59

On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
>
> > +static char *
> > +create_logical_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > +{
> > +     PQExpBuffer str = createPQExpBuffer();
> > +     PGresult   *res = NULL;
> > +     const char *slot_name = dbinfo->replslotname;
> > +     char       *slot_name_esc;
> > +     char       *lsn = NULL;
> > +
> > +     Assert(conn != NULL);
> > +
> > +     pg_log_info("creating the replication slot \"%s\" on database \"%s\"",
> > +                             slot_name, dbinfo->dbname);
> > +
> > +     slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name));
> > +
> > +     appendPQExpBuffer(str,
> > +                                       "SELECT lsn FROM pg_catalog.pg_create_logical_replication_slot(%s,
'pgoutput',false, false, false)", 
>
> This is passing twophase=false, but the patch does not mention prepared
> transactions.  Is the intent to not support workloads containing prepared
> transactions?  If so, the documentation should say that, and the tool likely
> should warn on startup if max_prepared_transactions != 0.
>

The other point to note in this regard is that if we don't support
two_phase in the beginning during subscription/slot setup, users won't
be able to change it as we don't yet support changing it via alter
subscription (though the patch to alter two_pc is proposed for PG18).

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

24 June 2024, 06:47:54

Dear Noah,

> pg_createsubscriber fails on a dbname containing a space.  Use
> appendConnStrVal() here and for other params in get_sub_conninfo().  See the
> CVE-2016-5424 commits for more background.  For one way to test this
> scenario,
> see generate_db() in the pg_upgrade test suite.

Thanks for pointing out. I made a fix patch. Test code was also modified accordingly.

> > +static char *
> > +create_logical_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > +{
> > +    PQExpBuffer str = createPQExpBuffer();
> > +    PGresult   *res = NULL;
> > +    const char *slot_name = dbinfo->replslotname;
> > +    char       *slot_name_esc;
> > +    char       *lsn = NULL;
> > +
> > +    Assert(conn != NULL);
> > +
> > +    pg_log_info("creating the replication slot \"%s\" on database \"%s\"",
> > +                slot_name, dbinfo->dbname);
> > +
> > +    slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name));
> > +
> > +    appendPQExpBuffer(str,
> > +                      "SELECT lsn FROM
> pg_catalog.pg_create_logical_replication_slot(%s, 'pgoutput', false, false, false)",
> 
> This is passing twophase=false, but the patch does not mention prepared
> transactions.  Is the intent to not support workloads containing prepared
> transactions?  If so, the documentation should say that, and the tool likely
> should warn on startup if max_prepared_transactions != 0.

IIUC, We decided because it is a default behavior of logical replication. See [1].
+1 for improving a documentation, but not sure it is helpful for adding output.
I want to know opinions from others.

> > +static void
> > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > +{
> 
> > +    appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > +                      ipubname_esc);
> 
> This tool's documentation says it "guarantees that no transaction will be
> lost."  I tried to determine whether achieving that will require something
> like the fix from
> https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprise
> db.com.
> (Not exactly the fix from that thread, since that thread has not discussed the
> FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> hand, pg_createsubscriber benefits from creating a logical slot after creating
> the publication.  That snapbuild.c process will wait for running XIDs.  On the
> other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and
> builds
> its relcache entry before assigning an XID, so perhaps the snapbuild.c process
> isn't enough to prevent that thread's race condition.  What do you think?

IIUC, documentation just intended to say that a type of replication will be
switched from stream to logical one, at the certain point. Please give sometime
for analyzing.

[1]: https://www.postgresql.org/message-id/270ad9b8-9c46-40c3-b6c5-3d25b91d3a7d%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

0001-pg_createsubscriber-Fix-cases-which-connection-param.patch

Re: speed up a logical replica setup

From

Amit Kapila

Date:

24 June 2024, 11:50:21

On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
>
> > +static void
> > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > +{
>
> > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > +                                       ipubname_esc);
>
> This tool's documentation says it "guarantees that no transaction will be
> lost."  I tried to determine whether achieving that will require something
> like the fix from
> https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> (Not exactly the fix from that thread, since that thread has not discussed the
> FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> hand, pg_createsubscriber benefits from creating a logical slot after creating
> the publication.  That snapbuild.c process will wait for running XIDs.  On the
> other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> its relcache entry before assigning an XID, so perhaps the snapbuild.c process
> isn't enough to prevent that thread's race condition.  What do you think?
>

I am not able to imagine how the race condition discussed in the
thread you quoted can impact this patch. The problem discussed is
mainly the interaction when we are processing the changes in logical
decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
problem happens because we use the old cache state. I am missing your
point about the race condition mentioned in the thread you quoted with
snapbuild.c. Can you please elaborate a bit more?

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Noah Misch

Date:

24 June 2024, 22:08:45

On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
> >
> > > +static void
> > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > +{
> >
> > > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > +                                       ipubname_esc);
> >
> > This tool's documentation says it "guarantees that no transaction will be
> > lost."  I tried to determine whether achieving that will require something
> > like the fix from
> > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> > (Not exactly the fix from that thread, since that thread has not discussed the
> > FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > the publication.  That snapbuild.c process will wait for running XIDs.  On the
> > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > its relcache entry before assigning an XID, so perhaps the snapbuild.c process

Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
relcache entry, just how pgoutput of the change builds the relcache entry from
the historic snapshot.

> > isn't enough to prevent that thread's race condition.  What do you think?
> 
> I am not able to imagine how the race condition discussed in the
> thread you quoted can impact this patch. The problem discussed is
> mainly the interaction when we are processing the changes in logical
> decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> problem happens because we use the old cache state.

Right.  Taking the example from
http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
between what that mail calls 4) and 5) are not safely usable as start points.
pg_createsubscriber evades that thread's problem if the consistent_lsn it
passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
span.  I cautiously bet the snapbuild.c process achieves that:

> I am missing your
> point about the race condition mentioned in the thread you quoted with
> snapbuild.c. Can you please elaborate a bit more?

When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
part starts at:

    /*
     * If caller needs us to determine the decoding start point, do so now.
     * This might take a while.
     */
    if (find_startpoint)
        DecodingContextFindStartpoint(ctx);

Two factors protect pg_createsubscriber.  First, (a) CREATE PUBLICATION
committed before pg_create_logical_replication_slot() started.  Second, (b)
DecodingContextFindStartpoint() waits for running XIDs to complete, via the
process described at the snapbuild.c "starting up in several stages" diagram.
Hence, the consistent_lsn is not in a bad-start-point LSN span.  It's fine
even if the original INSERT populated all caches before CREATE PUBLICATION
started and managed to assign an XID only after consistent_lsn.  From the
pgoutput perspective, that's indistinguishable from the transaction starting
at its first WAL record, after consistent_lsn.  The linked "long-standing data
loss bug in initial sync of logical replication" thread doesn't have (a),
hence its bug.  How close is that to accurate?

Thanks,
nm

Re: speed up a logical replica setup

From

Amit Kapila

Date:

25 June 2024, 06:24:54

On Tue, Jun 25, 2024 at 3:38 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> > On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
> > >
> > > > +static void
> > > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > > +{
> > >
> > > > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > > +                                       ipubname_esc);
> > >
> > > This tool's documentation says it "guarantees that no transaction will be
> > > lost."  I tried to determine whether achieving that will require something
> > > like the fix from
> > > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> > > (Not exactly the fix from that thread, since that thread has not discussed the
> > > FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> > > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > > the publication.  That snapbuild.c process will wait for running XIDs.  On the
> > > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > > its relcache entry before assigning an XID, so perhaps the snapbuild.c process
>
> Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
> relcache entry, just how pgoutput of the change builds the relcache entry from
> the historic snapshot.
>
> > > isn't enough to prevent that thread's race condition.  What do you think?
> >
> > I am not able to imagine how the race condition discussed in the
> > thread you quoted can impact this patch. The problem discussed is
> > mainly the interaction when we are processing the changes in logical
> > decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> > problem happens because we use the old cache state.
>
> Right.  Taking the example from
> http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
> between what that mail calls 4) and 5) are not safely usable as start points.
> pg_createsubscriber evades that thread's problem if the consistent_lsn it
> passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
> span.  I cautiously bet the snapbuild.c process achieves that:
>
> > I am missing your
> > point about the race condition mentioned in the thread you quoted with
> > snapbuild.c. Can you please elaborate a bit more?
>
> When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
> part starts at:
>
>         /*
>          * If caller needs us to determine the decoding start point, do so now.
>          * This might take a while.
>          */
>         if (find_startpoint)
>                 DecodingContextFindStartpoint(ctx);
>
> Two factors protect pg_createsubscriber.  First, (a) CREATE PUBLICATION
> committed before pg_create_logical_replication_slot() started.  Second, (b)
> DecodingContextFindStartpoint() waits for running XIDs to complete, via the
> process described at the snapbuild.c "starting up in several stages" diagram.
> Hence, the consistent_lsn is not in a bad-start-point LSN span.  It's fine
> even if the original INSERT populated all caches before CREATE PUBLICATION
> started and managed to assign an XID only after consistent_lsn.  From the
> pgoutput perspective, that's indistinguishable from the transaction starting
> at its first WAL record, after consistent_lsn.  The linked "long-standing data
> loss bug in initial sync of logical replication" thread doesn't have (a),
> hence its bug.  How close is that to accurate?
>

Yeah, this theory sounds right to me. The key point is that no DML
(processing of WAL corresponding to DML) before CREATE PUBLICATION ...
command would have reached pgoutput level because we would have waited
for it during snapbuild.c. Can we conclude that the race condition
discussed in the other thread won't impact this patch?

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 June 2024, 00:50:59

On Tue, Jun 25, 2024, at 3:24 AM, Amit Kapila wrote:

On Tue, Jun 25, 2024 at 3:38 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> > On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
> > >
> > > > +static void
> > > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > > +{
> > >
> > > > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > > +                                       ipubname_esc);
> > >
> > > This tool's documentation says it "guarantees that no transaction will be
> > > lost." I tried to determine whether achieving that will require something
> > > like the fix from
> > > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> > > (Not exactly the fix from that thread, since that thread has not discussed the
> > > FOR ALL TABLES version of its race condition.) I don't know. On the one
> > > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > > the publication. That snapbuild.c process will wait for running XIDs. On the
> > > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > > its relcache entry before assigning an XID, so perhaps the snapbuild.c process
>
> Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
> relcache entry, just how pgoutput of the change builds the relcache entry from
> the historic snapshot.
>
> > > isn't enough to prevent that thread's race condition. What do you think?
> >
> > I am not able to imagine how the race condition discussed in the
> > thread you quoted can impact this patch. The problem discussed is
> > mainly the interaction when we are processing the changes in logical
> > decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> > problem happens because we use the old cache state.
>
> Right. Taking the example from
> http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
> between what that mail calls 4) and 5) are not safely usable as start points.
> pg_createsubscriber evades that thread's problem if the consistent_lsn it
> passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
> span. I cautiously bet the snapbuild.c process achieves that:
>
> > I am missing your
> > point about the race condition mentioned in the thread you quoted with
> > snapbuild.c. Can you please elaborate a bit more?
>
> When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
> part starts at:
>
>         /*
>          * If caller needs us to determine the decoding start point, do so now.
>          * This might take a while.
>          */
>         if (find_startpoint)
>                 DecodingContextFindStartpoint(ctx);
>
> Two factors protect pg_createsubscriber. First, (a) CREATE PUBLICATION
> committed before pg_create_logical_replication_slot() started. Second, (b)
> DecodingContextFindStartpoint() waits for running XIDs to complete, via the
> process described at the snapbuild.c "starting up in several stages" diagram.
> Hence, the consistent_lsn is not in a bad-start-point LSN span. It's fine
> even if the original INSERT populated all caches before CREATE PUBLICATION
> started and managed to assign an XID only after consistent_lsn. From the
> pgoutput perspective, that's indistinguishable from the transaction starting
> at its first WAL record, after consistent_lsn. The linked "long-standing data
> loss bug in initial sync of logical replication" thread doesn't have (a),
> hence its bug. How close is that to accurate?
>

Yeah, this theory sounds right to me. The key point is that no DML
(processing of WAL corresponding to DML) before CREATE PUBLICATION ...
command would have reached pgoutput level because we would have waited
for it during snapbuild.c. Can we conclude that the race condition
discussed in the other thread won't impact this patch?

As Noah said the key point is the CREATE PUBLICATION *before* creating the

replication slots -- that wait transactions to complete.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

26 June 2024, 01:51:31

On Mon, Jun 24, 2024, at 3:47 AM, Hayato Kuroda (Fujitsu) wrote:

> pg_createsubscriber fails on a dbname containing a space. Use
> appendConnStrVal() here and for other params in get_sub_conninfo(). See the
> CVE-2016-5424 commits for more background. For one way to test this
> scenario,
> see generate_db() in the pg_upgrade test suite.

Thanks for pointing out. I made a fix patch. Test code was also modified accordingly.

Patch looks good to me. I have one suggestion

-# Mostly Ported from 002_pg_upgrade.pl, but this returns a generated dbname.

+# Extracted from 002_pg_upgrade.pl.

and 2 small fixes:

- 'logical replication works on database $db1');

+ "logical replication works on database $db1");

-is($result, qq(row 1), 'logical replication works on database $db2');

+is($result, qq(row 1), "logical replication works on database $db2");

> > +static char *
> > +create_logical_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > +{
> > + PQExpBuffer str = createPQExpBuffer();
> > + PGresult   *res = NULL;
> > + const char *slot_name = dbinfo->replslotname;
> > + char    *slot_name_esc;
> > + char    *lsn = NULL;
> > +
> > + Assert(conn != NULL);
> > +
> > + pg_log_info("creating the replication slot \"%s\" on database \"%s\"",
> > + slot_name, dbinfo->dbname);
> > +
> > + slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name));
> > +
> > + appendPQExpBuffer(str,
> > + "SELECT lsn FROM
> pg_catalog.pg_create_logical_replication_slot(%s, 'pgoutput', false, false, false)",
>
> This is passing twophase=false, but the patch does not mention prepared
> transactions. Is the intent to not support workloads containing prepared
> transactions? If so, the documentation should say that, and the tool likely
> should warn on startup if max_prepared_transactions != 0.

IIUC, We decided because it is a default behavior of logical replication. See [1].
+1 for improving a documentation, but not sure it is helpful for adding output.
I want to know opinions from others.

Documentation says

When two-phase commit is enabled, prepared transactions are sent to the

subscriber at the time of PREPARE TRANSACTION, and are processed as two-phase

transactions on the subscriber too. Otherwise, prepared transactions are sent to

the subscriber only when committed, and are then processed immediately by the

subscriber.

Hence, the replication should be working for prepared transactions even if it

created the slot with twophase = false. IIRC the user won't be able to change it

later. As Amit said in a previous email, once the command

ALTER SUBSCRIPTION ... SET (two_phase = on) is supported, users can change it

after running pg_createsubscriber. The other option is to add a command-line

option to enable or disable it.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Amit Kapila

Date:

26 June 2024, 03:58:46

On Wed, Jun 26, 2024 at 7:21 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Jun 24, 2024, at 3:47 AM, Hayato Kuroda (Fujitsu) wrote:
>
> > pg_createsubscriber fails on a dbname containing a space.  Use
> > appendConnStrVal() here and for other params in get_sub_conninfo().  See the
> > CVE-2016-5424 commits for more background.  For one way to test this
> > scenario,
> > see generate_db() in the pg_upgrade test suite.
>
> Thanks for pointing out. I made a fix patch. Test code was also modified accordingly.
>
>
> Patch looks good to me. I have one suggestion
>
> -# Mostly Ported from 002_pg_upgrade.pl, but this returns a generated dbname.
> +# Extracted from 002_pg_upgrade.pl.
>
> and 2 small fixes:
>
> -   'logical replication works on database $db1');
> +   "logical replication works on database $db1");
>
> -is($result, qq(row 1), 'logical replication works on database $db2');
> +is($result, qq(row 1), "logical replication works on database $db2");
>
> > > +static char *
> > > +create_logical_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > +{
> > > + PQExpBuffer str = createPQExpBuffer();
> > > + PGresult   *res = NULL;
> > > + const char *slot_name = dbinfo->replslotname;
> > > + char    *slot_name_esc;
> > > + char    *lsn = NULL;
> > > +
> > > + Assert(conn != NULL);
> > > +
> > > + pg_log_info("creating the replication slot \"%s\" on database \"%s\"",
> > > + slot_name, dbinfo->dbname);
> > > +
> > > + slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name));
> > > +
> > > + appendPQExpBuffer(str,
> > > +   "SELECT lsn FROM
> > pg_catalog.pg_create_logical_replication_slot(%s, 'pgoutput', false, false, false)",
> >
> > This is passing twophase=false, but the patch does not mention prepared
> > transactions.  Is the intent to not support workloads containing prepared
> > transactions?  If so, the documentation should say that, and the tool likely
> > should warn on startup if max_prepared_transactions != 0.
>
> IIUC, We decided because it is a default behavior of logical replication. See [1].
> +1 for improving a documentation, but not sure it is helpful for adding output.
> I want to know opinions from others.
>
>
> Documentation says
>
> When two-phase commit is enabled, prepared transactions are sent to the
> subscriber at the time of PREPARE TRANSACTION, and are processed as two-phase
> transactions on the subscriber too. Otherwise, prepared transactions are sent to
> the subscriber only when committed, and are then processed immediately by the
> subscriber.
>
> Hence, the replication should be working for prepared transactions even if it
> created the slot with twophase = false. IIRC the user won't be able to change it
> later. As Amit said in a previous email, once the command
> ALTER SUBSCRIPTION ... SET (two_phase = on) is supported, users can change it
> after running pg_createsubscriber. The other option is to add a command-line
> option to enable or disable it.
>

Yeah, it is a good idea to add a new option for two_phase but that
should be done in the next version. For now, I suggest updating the
docs and probably raising a warning (if max_prepared_transactions !=
0) as suggested by Noah. This WARNING is useful because one could
expect that setting max_prepared_transactions != 0 means apply will
happen at prepare time after the subscriber is created by this tool.
The WARNING will be useful even if we support two_phase option as the
user may have set the non-zero value of max_prepared_transactions but
didn't use two_phase option.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

26 June 2024, 06:04:55

Dear Amit, Euler,

Thanks for giving comment! 0001 was modified per suggestions.

> Yeah, it is a good idea to add a new option for two_phase but that
> should be done in the next version. For now, I suggest updating the
> docs and probably raising a warning (if max_prepared_transactions !=
> 0) as suggested by Noah. This WARNING is useful because one could
> expect that setting max_prepared_transactions != 0 means apply will
> happen at prepare time after the subscriber is created by this tool.
> The WARNING will be useful even if we support two_phase option as the
> user may have set the non-zero value of max_prepared_transactions but
> didn't use two_phase option.

I also think it should be tunable, in PG18+. 0002 adds a description in doc.
Also, this executable warns if the max_prepared_transactions != 0 on the
publisher. Subscriber-side is not checked now because I don't think it is helpful,
but it can be easily added. I did not add test for that, because current test code
does not check outputs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

Re: speed up a logical replica setup

From

Noah Misch

Date:

26 June 2024, 19:21:16

On Tue, Jun 25, 2024 at 09:50:59PM -0300, Euler Taveira wrote:
> On Tue, Jun 25, 2024, at 3:24 AM, Amit Kapila wrote:
> > On Tue, Jun 25, 2024 at 3:38 AM Noah Misch <noah@leadboat.com> wrote:
> > > On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> > > > On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
> > > > > > +static void
> > > > > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > > > > +{
> > > > >
> > > > > > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > > > > +                                       ipubname_esc);
> > > > >
> > > > > This tool's documentation says it "guarantees that no transaction will be
> > > > > lost."  I tried to determine whether achieving that will require something
> > > > > like the fix from
> > > > > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> > > > > (Not exactly the fix from that thread, since that thread has not discussed the
> > > > > FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> > > > > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > > > > the publication.  That snapbuild.c process will wait for running XIDs.  On the
> > > > > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > > > > its relcache entry before assigning an XID, so perhaps the snapbuild.c process
> > >
> > > Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
> > > relcache entry, just how pgoutput of the change builds the relcache entry from
> > > the historic snapshot.
> > >
> > > > > isn't enough to prevent that thread's race condition.  What do you think?
> > > >
> > > > I am not able to imagine how the race condition discussed in the
> > > > thread you quoted can impact this patch. The problem discussed is
> > > > mainly the interaction when we are processing the changes in logical
> > > > decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> > > > problem happens because we use the old cache state.
> > >
> > > Right.  Taking the example from
> > > http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
> > > between what that mail calls 4) and 5) are not safely usable as start points.
> > > pg_createsubscriber evades that thread's problem if the consistent_lsn it
> > > passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
> > > span.  I cautiously bet the snapbuild.c process achieves that:
> > >
> > > > I am missing your
> > > > point about the race condition mentioned in the thread you quoted with
> > > > snapbuild.c. Can you please elaborate a bit more?
> > >
> > > When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
> > > part starts at:
> > >
> > >         /*
> > >          * If caller needs us to determine the decoding start point, do so now.
> > >          * This might take a while.
> > >          */
> > >         if (find_startpoint)
> > >                 DecodingContextFindStartpoint(ctx);
> > >
> > > Two factors protect pg_createsubscriber.  First, (a) CREATE PUBLICATION
> > > committed before pg_create_logical_replication_slot() started.  Second, (b)
> > > DecodingContextFindStartpoint() waits for running XIDs to complete, via the
> > > process described at the snapbuild.c "starting up in several stages" diagram.
> > > Hence, the consistent_lsn is not in a bad-start-point LSN span.  It's fine
> > > even if the original INSERT populated all caches before CREATE PUBLICATION
> > > started and managed to assign an XID only after consistent_lsn.  From the
> > > pgoutput perspective, that's indistinguishable from the transaction starting
> > > at its first WAL record, after consistent_lsn.  The linked "long-standing data
> > > loss bug in initial sync of logical replication" thread doesn't have (a),
> > > hence its bug.  How close is that to accurate?
> > 
> > Yeah, this theory sounds right to me. The key point is that no DML
> > (processing of WAL corresponding to DML) before CREATE PUBLICATION ...
> > command would have reached pgoutput level because we would have waited
> > for it during snapbuild.c. Can we conclude that the race condition
> > discussed in the other thread won't impact this patch?
> 
> As Noah said the key point is the CREATE PUBLICATION *before* creating the
> replication slots -- that wait transactions to complete.

Let's consider the transaction loss topic concluded.  Thanks for contemplating
it.  I've added an open item for the "dbname containing a space" topic.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

28 June 2024, 05:11:42

On Wed, Jun 26, 2024 at 11:35 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thanks for giving comment! 0001 was modified per suggestions.
>
> > Yeah, it is a good idea to add a new option for two_phase but that
> > should be done in the next version. For now, I suggest updating the
> > docs and probably raising a warning (if max_prepared_transactions !=
> > 0) as suggested by Noah. This WARNING is useful because one could
> > expect that setting max_prepared_transactions != 0 means apply will
> > happen at prepare time after the subscriber is created by this tool.
> > The WARNING will be useful even if we support two_phase option as the
> > user may have set the non-zero value of max_prepared_transactions but
> > didn't use two_phase option.
>
> I also think it should be tunable, in PG18+. 0002 adds a description in doc.
> Also, this executable warns if the max_prepared_transactions != 0 on the
> publisher.
>

I have slightly adjusted the doc update and warning message for the
0002 patch. Please see attached and let me know what do you think.

--
With Regards,
Amit Kapila.

Attachment

v3-0001-pg_createsubscriber-Warn-the-two-phase-is-disable.patch

Re: speed up a logical replica setup

From

Alexander Lakhin

Date:

30 June 2024, 13:00:00

Hello Peter and Euler,

17.06.2024 14:04, Peter Eisentraut wrote:
> On 07.06.24 05:49, Euler Taveira wrote:
>> Here it is a patch series to fix the issues reported in recent discussions. The
>> patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes
>> synchronized failover slots on subscriber since it has no use. I also included
>> an optional patch 0004 that improves the usability by checking both servers if
>> it already failed in any subscriber check.
>
> I have committed 0001, 0002, and 0003.  Let's keep an eye on the buildfarm to see if that stabilizes things.  So far

> it looks good.
>
> For 0004, I suggest inverting the result values from check_publisher() and create_subscriber() so that it returns
true
 
> if the check is ok.

As a recent buildfarm failure [1] shows, that test addition introduced
new instability:
### Starting node "node_s"
# Running: pg_ctl -w -D 

/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata

-l 

/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/log/040_pg_createsubscriber_node_s.log

-o --cluster-name=node_s start
waiting for server to start.... done
server started
# Postmaster PID for node "node_s" is 416482
error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the received slot sync LSN 0/30047F0 for 
slot "failover_slot" is ahead of the standby position 0/3004708'
while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'SELECT
 
pg_sync_replication_slots()' at /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line
2126.

I could reproduce this failure with:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t startup_data_len)
                       * let the startup process and primary server know about
                       * them.
                       */
+pg_usleep(300000);
                      XLogWalRcvFlush(false, startpointTLI);

make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"

# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 23.
t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 23 subtests passed

Test Summary Report
-------------------
t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
   Non-zero exit status: 29
   Parse errors: No plan found in TAP output
Files=1, Tests=23,  4 wallclock secs ( 0.01 usr  0.01 sys +  0.49 cusr  0.44 csys =  0.95 CPU)

Moreover, this test may suffer from autovacuum:
echo "
autovacuum_naptime = 1
autovacuum_analyze_threshold = 1
" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"

# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 24/?
#   Failed test 'failover slot is synced'
#   at t/040_pg_createsubscriber.pl line 273.
#          got: ''
#     expected: 'failover_slot'
t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/33 subtests

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-06-28%2004%3A42%3A48

Best regards,
Alexander

Re: speed up a logical replica setup

From

Tom Lane

Date:

30 June 2024, 16:59:54

Alexander Lakhin <exclusion@gmail.com> writes:
> As a recent buildfarm failure [1] shows, that test addition introduced
> new instability:

I have a different but possibly-related complaint: why is
040_pg_createsubscriber.pl so miserably slow?  On my machine it
runs for a bit over 19 seconds, which seems completely out of line
(for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
other test scripts in this directory take much less).  It looks
like most of the blame falls on this step:

[12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S

AFAICS the amount of data being replicated is completely trivial,
so that it doesn't make any sense for this to take so long --- and
if it does, that suggests that this tool will be impossibly slow
for production use.  But I suspect there is a logic flaw causing
this.  Speculating wildly, perhaps that is related to the failure
Alexander spotted?

            regards, tom lane

Re: speed up a logical replica setup

From

Tom Lane

Date:

30 June 2024, 18:40:40

In hopes of moving things along as we approach the v18 branch,
I went ahead and pushed Kuroda-san's patches (with a bit of
further editorialization).  AFAICS that allows closing out
the concerns raised by Noah, so I've marked that open item
done.  However, I added a new open item about how the
040_pg_createsubscriber.pl test is slow and still unstable.

            regards, tom lane

Re: speed up a logical replica setup

From

Tom Lane

Date:

30 June 2024, 18:58:00

I wrote:
> In hopes of moving things along as we approach the v18 branch,
> I went ahead and pushed Kuroda-san's patches (with a bit of
> further editorialization).

So b3f5ccebd promptly blew up on fairywren [1]:

connection error: 'psql: error: unterminated quoted string in connection info string'
while running 'psql -XAtq -d port=52984 host=C:/tools/nmsys64/tmp/hHg_pngw4z dbname='regression\\\\"\\\\
!"#$%&\\'()*+,-\\\\\\\\"\\\\\\\\\\\\' -f - -v ON_ERROR_STOP=1' at
C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pmline 2124. 

This shows that the command-line arguments to psql are not getting
adequately quoted on that platform.  AFAICS, we are relying on
IPC::Run::run to perform such quoting, and it seems to be
falling down here.  I wonder what version of IPC::Run fairywren
is using.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-06-30%2018%3A03%3A06

Re: speed up a logical replica setup

From

Noah Misch

Date:

30 June 2024, 19:25:34

On Sun, Jun 30, 2024 at 02:58:00PM -0400, Tom Lane wrote:
> I wrote:
> > In hopes of moving things along as we approach the v18 branch,
> > I went ahead and pushed Kuroda-san's patches (with a bit of
> > further editorialization).
>
> So b3f5ccebd promptly blew up on fairywren [1]:
>
> connection error: 'psql: error: unterminated quoted string in connection info string'
> while running 'psql -XAtq -d port=52984 host=C:/tools/nmsys64/tmp/hHg_pngw4z dbname='regression\\\\"\\\\
!"#$%&\\'()*+,-\\\\\\\\"\\\\\\\\\\\\' -f - -v ON_ERROR_STOP=1' at
C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pmline 2124. 
>
> This shows that the command-line arguments to psql are not getting
> adequately quoted on that platform.  AFAICS, we are relying on
> IPC::Run::run to perform such quoting, and it seems to be
> falling down here.  I wonder what version of IPC::Run fairywren
> is using.

It does look consistent with IPC::Run predating v20220807.0, hence lacking the
https://github.com/toddr/IPC-Run/issues/142 fix.  I wondered how this animal
was passing 002_pg_upgrade.pl, but it seems the animal has run TAP suites only
occasionally.  Passes in the last week were TAP-free.

> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-06-30%2018%3A03%3A06

Re: speed up a logical replica setup

From

Tom Lane

Date:

30 June 2024, 21:01:52

Noah Misch <noah@leadboat.com> writes:
> On Sun, Jun 30, 2024 at 02:58:00PM -0400, Tom Lane wrote:
>> So b3f5ccebd promptly blew up on fairywren [1]:

> It does look consistent with IPC::Run predating v20220807.0, hence lacking the
> https://github.com/toddr/IPC-Run/issues/142 fix.  I wondered how this animal
> was passing 002_pg_upgrade.pl, but it seems the animal has run TAP suites only
> occasionally.  Passes in the last week were TAP-free.

Hmm, drongo just failed in the same way.  (It seems to likewise
run TAP tests only some of the time, which I don't understand
because the $Script_Config output looks the same between runs
with TAP tests and runs without.)

I'm tempted to lobotomize the new test case on Windows until
we have that resolved.

            regards, tom lane

Re: speed up a logical replica setup

From

Noah Misch

Date:

30 June 2024, 21:08:17

On Sun, Jun 30, 2024 at 05:01:52PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sun, Jun 30, 2024 at 02:58:00PM -0400, Tom Lane wrote:
> >> So b3f5ccebd promptly blew up on fairywren [1]:
> 
> > It does look consistent with IPC::Run predating v20220807.0, hence lacking the
> > https://github.com/toddr/IPC-Run/issues/142 fix.  I wondered how this animal
> > was passing 002_pg_upgrade.pl, but it seems the animal has run TAP suites only
> > occasionally.  Passes in the last week were TAP-free.
> 
> Hmm, drongo just failed in the same way.  (It seems to likewise
> run TAP tests only some of the time, which I don't understand
> because the $Script_Config output looks the same between runs
> with TAP tests and runs without.)

On further reflection, 002_pg_upgrade.pl may not fail with old IPC::Run.  It
may just create a database with an unintended name, losing some test coverage.

> I'm tempted to lobotomize the new test case on Windows until
> we have that resolved.

Sounds fine.  The pg_upgrade suite adequately tests appendShellString() and
appendConnStrVal() with the larger character repertoire, so it's enough for
pg_createsubscriber to test just a space or so.

Re: speed up a logical replica setup

From

Tom Lane

Date:

01 July 2024, 01:32:57

Noah Misch <noah@leadboat.com> writes:
> On Sun, Jun 30, 2024 at 05:01:52PM -0400, Tom Lane wrote:
>> I'm tempted to lobotomize the new test case on Windows until
>> we have that resolved.

> Sounds fine.  The pg_upgrade suite adequately tests appendShellString() and
> appendConnStrVal() with the larger character repertoire, so it's enough for
> pg_createsubscriber to test just a space or so.

Hmph ... according to [1], 545082091 was not enough to fix this.
I guess that old version of IPC::Run also misbehaves for cases
involving backslash, single quote, and/or some other special
characters?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-07-01%2000%3A03%3A05

Re: speed up a logical replica setup

From

Noah Misch

Date:

01 July 2024, 02:33:42

On Sun, Jun 30, 2024 at 09:32:57PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sun, Jun 30, 2024 at 05:01:52PM -0400, Tom Lane wrote:
> >> I'm tempted to lobotomize the new test case on Windows until
> >> we have that resolved.
> 
> > Sounds fine.  The pg_upgrade suite adequately tests appendShellString() and
> > appendConnStrVal() with the larger character repertoire, so it's enough for
> > pg_createsubscriber to test just a space or so.
> 
> Hmph ... according to [1], 545082091 was not enough to fix this.
> I guess that old version of IPC::Run also misbehaves for cases
> involving backslash, single quote, and/or some other special
> characters?

The database name has backslash toward the end, which likely creates a
problematic backslash-double-quote sequence under win32 command line rules.
You can see that in the node log, comparing the CREATE DATABASE log line
(shows dquote at end of dbname) to the FATAL log line (no dquote at end).  I'm
not aware of trouble with characters other than backslash or dquote.

> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-07-01%2000%3A03%3A05

Re: speed up a logical replica setup

From

Tom Lane

Date:

01 July 2024, 03:22:25

Noah Misch <noah@leadboat.com> writes:
> On Sun, Jun 30, 2024 at 09:32:57PM -0400, Tom Lane wrote:
>> Hmph ... according to [1], 545082091 was not enough to fix this.
>> I guess that old version of IPC::Run also misbehaves for cases
>> involving backslash, single quote, and/or some other special
>> characters?

> The database name has backslash toward the end, which likely creates a
> problematic backslash-double-quote sequence under win32 command line rules.

OK, I've made it strip backslashes too.  Thanks for confirming.

            regards, tom lane

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

01 July 2024, 14:52:46

Dear Tom,

> I have a different but possibly-related complaint: why is
> 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> runs for a bit over 19 seconds, which seems completely out of line
> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> other test scripts in this directory take much less).  It looks
> like most of the blame falls on this step:
> 
> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> 
> AFAICS the amount of data being replicated is completely trivial,
> so that it doesn't make any sense for this to take so long --- and
> if it does, that suggests that this tool will be impossibly slow
> for production use.  But I suspect there is a logic flaw causing
> this.

I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
wasted some time. This was caused by the recovery target seeming unsatisfactory.

We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
which returns the end of the RUNNING_XACT record. If we use the returned value as
recovery_target_lsn as-is, however, we must wait for additional WAL generation
because the parameter requires that the replicated WAL overtake a certain point.
On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.

One simple solution is to add an additional WAL record at the end of the publisher
setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
time on my environment.

```
BEFORE
(13.751s) ok 30 - run pg_createsubscriber on node S
AFTER
(0.749s) ok 30 - run pg_createsubscriber on node S
```

However, even after the modification, the reported failure [1] could not be resolved on my env.

How do you think?

[1]: https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

emit_dummy_message.diff

Re: speed up a logical replica setup

From

Amit Kapila

Date:

02 July 2024, 12:24:07

On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > I have a different but possibly-related complaint: why is
> > 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> > runs for a bit over 19 seconds, which seems completely out of line
> > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > other test scripts in this directory take much less).  It looks
> > like most of the blame falls on this step:
> >
> > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >
> > AFAICS the amount of data being replicated is completely trivial,
> > so that it doesn't make any sense for this to take so long --- and
> > if it does, that suggests that this tool will be impossibly slow
> > for production use.  But I suspect there is a logic flaw causing
> > this.
>
> I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
> wasted some time. This was caused by the recovery target seeming unsatisfactory.
>
> We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
> which returns the end of the RUNNING_XACT record. If we use the returned value as
> recovery_target_lsn as-is, however, we must wait for additional WAL generation
> because the parameter requires that the replicated WAL overtake a certain point.
> On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.
>

IIUC, the problem is that the consistent_lsn value returned by
setup_publisher() is the "end +1" location of the required LSN whereas
the recovery_target_lsn used in wait_for_end_recovery() expects the
LSN value to be "start" location of required LSN.

> One simple solution is to add an additional WAL record at the end of the publisher
> setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
> patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
> time on my environment.
>

This sounds like an ugly hack to me and don't know if we can use it.
The ideal way to fix this is to get the start_lsn from the
create_logical_slot functionality or have some parameter like
recover_target_end_lsn but I don't know if this is a good time to
extend such a functionality.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

03 July 2024, 03:51:48

Dear Amit,

> IIUC, the problem is that the consistent_lsn value returned by
> setup_publisher() is the "end +1" location of the required LSN whereas
> the recovery_target_lsn used in wait_for_end_recovery() expects the
> LSN value to be "start" location of required LSN.

Yeah, right. It is same as my understanding.
    
> This sounds like an ugly hack to me and don't know if we can use it.

I also think it is hacky, but I could not find better solutions.

> The ideal way to fix this is to get the start_lsn from the
> create_logical_slot functionality or have some parameter like
> recover_target_end_lsn but I don't know if this is a good time to
> extend such a functionality.

I felt that such approach might be used for HEAD, but not suitable for PG17.
Alternative approach I came up with was to insert a tuple while waiting the
promotion. It can generate a WAL record so that standby can finish after the
application. But I'm not sure how do we do and it seems to lead an additional
timing issue. Also, this does not improve the behavior of the command - normal
user may have to wait some time by the command.

Do you have any other idea?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

03 July 2024, 05:12:25

Dear Alexander and other hackers,

> As a recent buildfarm failure [1] shows, that test addition introduced
> new instability:
> ### Starting node "node_s"
> # Running: pg_ctl -w -D
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata
> -l
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/log/040_pg_createsubscriber_node_s.log
> -o --cluster-name=node_s start
> waiting for server to start.... done
> server started
> # Postmaster PID for node "node_s" is 416482
> error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the
> received slot sync LSN 0/30047F0 for
> slot "failover_slot" is ahead of the standby position 0/3004708'
> while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT
> pg_sync_replication_slots()' at
> /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster
> .pm line 2126.
> 
> I could reproduce this failure with:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t
> startup_data_len)
>                        * let the startup process and primary server know
> about
>                        * them.
>                        */
> +pg_usleep(300000);
>                       XLogWalRcvFlush(false, startpointTLI);
> 
> make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"
> 
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared
> and done_testing() was not seen.
> # Looks like your test exited with 29 just after 23.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 23 subtests passed
> 
> Test Summary Report
> -------------------
> t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
>    Non-zero exit status: 29
>    Parse errors: No plan found in TAP output
> Files=1, Tests=23,  4 wallclock secs ( 0.01 usr  0.01 sys +  0.49 cusr  0.44
> csys =  0.95 CPU)

I thought it was not related with the feature of pg_createsubscriber. It was
related with the slotsync function. In the first place, the error message was
output with the below backtrace.

```
synchronize_one_slot
synchronize_slots
pg_sync_replication_slots
```

And I found the error could be reproduced by the attached script with the source
modification by Alexander [1]. According to my analysis, this could be caused when
1) a replication slot is created with failover = true and XLOG_RUNNING_XACT record is generated, and
2) pg_sync_replication_slots() is called *before* the record is flushed on the
standby. To stabilize the test, we can call wait_for_replay_catchup() after the slot
creation on the primary.

> Moreover, this test may suffer from autovacuum:
> echo "
> autovacuum_naptime = 1
> autovacuum_analyze_threshold = 1
> " > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/
> PROVE_TESTS="t/040*"
> 
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 24/?
> #   Failed test 'failover slot is synced'
> #   at t/040_pg_createsubscriber.pl line 273.
> #          got: ''
> #     expected: 'failover_slot'
> t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/33 subtests

Thanks for reporting. I could reproduce the failure, and some BF animals said NG.
According to them, the root cause seemed that slot synchronization was failed.

```
LOG:  statement: SELECT pg_sync_replication_slots()
LOG:  could not sync slot "failover_slot" as remote slot precedes local slot
DETAIL:  Remote slot has LSN 0/3006890 and catalog xmin 743, but local slot has LSN 0/3006890 and catalog xmin 744.
```

Based on that, I considered a scenario why the slot could not be synchronized.
I felt this was not caused by the pg_createsubscriber.

1. At initial stage, the xmin of the physical slot is 743, and nextXid of the
   primary is also 743.
2. Autovacuum worker starts a new transaction. nextXid is incremented to 744.
3. Tries to creates a logical replication slot with failover=true *before the
   transaction at step2 is replicated to the standby*.
4. While creating the slot, the catalog_xmin must be determined.
   The initial candidate is nextXid (= 744), but the oldest xmin of replication
   slots (=743) is used if it is older than nextXid. So 743 is chosen in this case.
   This operaion is done in CreateInitDecodingContext()->GetOldestSafeDecodingContext().
5. After that, the transaction at step2 is reached to the standby node and it
   updates the nextXid.
6. Finally runs pg pg_sync_replication_slots() on the standby. It finds a failover
   slot on the primary and tries to create on the standby. However, the
   catalog_xmin on the primary (743) is older than the nextXid of the standby (744)
   so that it skips to create a slot.

To avoid the issue, we can disable the autovacuuming while testing.

# Descriptions for attached files

An attached script can be used to reproduce the first failure without pg_createsubscriber.
It requires to modify the code like [1].

0001 patch can reduce the wait time of test. See [3]. I know the approach is bit hacky but
it worked.
0002 patch waits until the WAL record is replicated. This fixes a first failure.
0003 patch disables autovacuum for node_p and node_s. I think node_p is enough, but did
like that just in case. This fixes a second failure.


[1]: https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-07-02%2008%3A45%3A39
[3]:
https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Dear Amit,

> Your analysis looks correct to me. The test could fail due to
> autovacuum. See the following comment in
> 040_standby_failover_slots_sync.
> 
> # Disable autovacuum to avoid generating xid during stats update as otherwise
> # the new XID could then be replicated to standby at some random point making
> # slots at primary lag behind standby during slot sync.
> $publisher->append_conf('postgresql.conf', 'autovacuum = off');
>

Oh, I could not find the comment. I felt it should be added even in
040_pg_createsubscriber.pl. Done.

> > # Descriptions for attached files
> >
> > An attached script can be used to reproduce the first failure without
> pg_createsubscriber.
> > It requires to modify the code like [1].
> 
> > 0003 patch disables autovacuum for node_p and node_s. I think node_p is
> enough, but did
> > like that just in case. This fixes a second failure.
> >
> 
> Disabling on the primary node should be sufficient. Let's do the
> minimum required to stabilize this test.

+1, removed.

PSA new version. 0001 has not been changed yet. A comment was added
in 0002 to clarify why we must wait. For 0003, a comment was added and
setting for standby was reverted.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

On Tue, Jul 9, 2024, at 8:00 AM, Alexander Lakhin wrote:

Hello Amit and Kuroda-san,

03.07.2024 14:02, Amit Kapila wrote:
> Pushed 0002 and 0003. Let's wait for a discussion on 0001.

Please look at another failure of the test [1]:
[13:28:05.647](2.460s) not ok 26 - failover slot is synced
[13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
#   at /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl line 307.
[13:28:05.648](0.000s) #          got: ''
#     expected: 'failover_slot'

I'm wondering if the attached patch is sufficient to move the restart_lsn

forward. I experimented several lightweight ideas but none works. BTW the steps

to create the failover slot here is similar 040_standby_failover_slots_sync.pl.

I don't have a clue why it is failing for this one.

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

fix.diff

Re: speed up a logical replica setup

From

Amit Kapila

Date:

11 July 2024, 10:21:16

On Tue, Jul 9, 2024 at 4:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Please look at another failure of the test [1]:
> [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> #   at /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl line 307.
> [13:28:05.648](0.000s) #          got: ''
> #     expected: 'failover_slot'
>
> with 040_pg_createsubscriber_node_s.log containing:
> 2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG: statement: SELECT pg_sync_replication_slots()
> 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG: could not sync slot "failover_slot" as remote slot
> precedes local slot
> 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:  Remote slot has LSN 0/30047B8 and catalog xmin
> 743, but local slot has LSN 0/30047B8 and catalog xmin 744.
>
> I could not reproduce it locally, but I've discovered that that subtest
> somehow depends on pg_createsubscriber executed for the
> 'primary contains unmet conditions on node P' check. For example with this
> test modification:
> @@ -249,7 +249,7 @@ command_fails(
>           $node_p->connstr($db1), '--socket-directory',
>           $node_s->host, '--subscriber-port',
>           $node_s->port, '--database',
> -        $db1, '--database',
> +        'XXX', '--database',
>           $db2
>       ],
>       'primary contains unmet conditions on node P');
>
> I see the same failure:
> 2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:  statement: SELECT pg_sync_replication_slots()
> 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:  could not sync slot "failover_slot" as remote
slot
> precedes local slot
> 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl DETAIL:  Remote slot has LSN 0/3004780 and catalog
xmin
> 743, but local slot has LSN 0/3004780 and catalog xmin 744.
>
> Thus maybe even a normal pg_createsubscriber run can affect the primary
> server (it's catalog xmin) differently?
>

Yes, pg_createsubscriber can affect the primary server's catalog xmin
because it starts the standby server that can send HSFeedback (See
XLogWalRcvSendHSFeedback()), which can advance the physical slot's
xmin corresponding the following Insert in the test:

# Insert another row on node P and wait node S to catch up
$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
$node_p->wait_for_replay_catchup($node_s);

In the success case, pg_createsubscriber is able to send HSFeedback
and in the failure case, it won't. We can see the following logs in
040_pg_createsubscriber_node_p.log:

2024-07-08 13:28:00.872 UTC [3982331][walsender][:0] FATAL:  the
database system is starting up
2024-07-08 13:28:00.875 UTC [3982328][startup][:0] LOG:  database
system was shut down at 2024-07-08 13:28:00 UTC
2024-07-08 13:28:01.105 UTC [3981996][postmaster][:0] LOG:  database
system is ready to accept connections

This shows that when the test  'primary contains unmet conditions on
node P' starts the standby server the corresponding primary node was
not ready because we just restarted node_p before that test and didn't
ensure that the node_p is up and ready to accept connections before
starting the pg_createsubscriber test.

Even in the successful cases where the standby is able to connect to
primary for test 'primary contains unmet conditions on node P', there
is no guarantee that xmin of the physical slot will be updated at
least, we don't have anything in the test to ensure the same.

Now as before creating logical replication, we didn't ensure that the
physical slot's xmin has been caught up to the latest value, the test
can lead to failure like: "Remote slot has LSN 0/3004780 and catalog
xmin 743, but local slot has LSN 0/3004780 and catalog xmin 744".

The xmin on standby could have been advanced due to the following Insert:
# Insert another row on node P and wait node S to catch up
$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
$node_p->wait_for_replay_catchup($node_s);

We don't wait for the xmin to catch up corresponding to this insert
and I don't know if there is a way to do that. So, we should move this
Insert to after the call to pg_sync_replication_slots(). It won't
impact the general test of pg_createsubscriber.

Thanks to Hou-San for helping me in the analysis of this BF failure.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

11 July 2024, 10:26:00

On Wed, Jul 10, 2024 at 4:51 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Jul 9, 2024, at 8:00 AM, Alexander Lakhin wrote:
>
> Hello Amit and Kuroda-san,
>
> 03.07.2024 14:02, Amit Kapila wrote:
> > Pushed 0002 and 0003. Let's wait for a discussion on 0001.
>
> Please look at another failure of the test [1]:
> [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> #   at /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl line 307.
> [13:28:05.648](0.000s) #          got: ''
> #     expected: 'failover_slot'
>
>
> I'm wondering if the attached patch is sufficient to move the restart_lsn
> forward.
>

This won't guarantee that xmin of the logical slot on the primary has
moved ahead or at least we haven't ensured the same. So, we should try
not to perform statements that can increase the xmin on standby before
creating a logical slot. See the analysis sent in the previous email.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Zhijie Hou (Fujitsu)"

Date:

11 July 2024, 11:22:03

On Thursday, July 11, 2024 6:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Jul 9, 2024 at 4:30 PM Alexander Lakhin <exclusion@gmail.com>
> wrote:
> >
> > Please look at another failure of the test [1]:
> > [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> > [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> > #   at
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/04
> 0_pg_createsubscriber.pl line 307.
> > [13:28:05.648](0.000s) #          got: ''
> > #     expected: 'failover_slot'
> >
> > with 040_pg_createsubscriber_node_s.log containing:
> > 2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG:
> > statement: SELECT pg_sync_replication_slots()
> > 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG:
> > could not sync slot "failover_slot" as remote slot precedes local slot
> > 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:
> > Remote slot has LSN 0/30047B8 and catalog xmin 743, but local slot has LSN
> 0/30047B8 and catalog xmin 744.
> >
> > I could not reproduce it locally, but I've discovered that that
> > subtest somehow depends on pg_createsubscriber executed for the
> > 'primary contains unmet conditions on node P' check. For example with
> > this test modification:
> > @@ -249,7 +249,7 @@ command_fails(
> >           $node_p->connstr($db1), '--socket-directory',
> >           $node_s->host, '--subscriber-port',
> >           $node_s->port, '--database',
> > -        $db1, '--database',
> > +        'XXX', '--database',
> >           $db2
> >       ],
> >       'primary contains unmet conditions on node P');
> >
> > I see the same failure:
> > 2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:
> > statement: SELECT pg_sync_replication_slots()
> > 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:
> > could not sync slot "failover_slot" as remote slot precedes local slot
> > 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl
> > DETAIL:  Remote slot has LSN 0/3004780 and catalog xmin 743, but local
> slot has LSN 0/3004780 and catalog xmin 744.
> >
> > Thus maybe even a normal pg_createsubscriber run can affect the
> > primary server (it's catalog xmin) differently?
> >
> 
> Yes, pg_createsubscriber can affect the primary server's catalog xmin because
> it starts the standby server that can send HSFeedback (See
> XLogWalRcvSendHSFeedback()), which can advance the physical slot's xmin
> corresponding the following Insert in the test:
> 
> # Insert another row on node P and wait node S to catch up
> $node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
> $node_p->wait_for_replay_catchup($node_s);
> 
> In the success case, pg_createsubscriber is able to send HSFeedback and in
> the failure case, it won't. We can see the following logs in
> 040_pg_createsubscriber_node_p.log:
> 
> 2024-07-08 13:28:00.872 UTC [3982331][walsender][:0] FATAL:  the database
> system is starting up
> 2024-07-08 13:28:00.875 UTC [3982328][startup][:0] LOG:  database system
> was shut down at 2024-07-08 13:28:00 UTC
> 2024-07-08 13:28:01.105 UTC [3981996][postmaster][:0] LOG:  database
> system is ready to accept connections
> 
> This shows that when the test  'primary contains unmet conditions on node P'
> starts the standby server the corresponding primary node was not ready
> because we just restarted node_p before that test and didn't ensure that the
> node_p is up and ready to accept connections before starting the
> pg_createsubscriber test.
> 
> Even in the successful cases where the standby is able to connect to primary
> for test 'primary contains unmet conditions on node P', there is no guarantee
> that xmin of the physical slot will be updated at least, we don't have anything in
> the test to ensure the same.
> 
> Now as before creating logical replication, we didn't ensure that the physical
> slot's xmin has been caught up to the latest value, the test can lead to failure
> like: "Remote slot has LSN 0/3004780 and catalog xmin 743, but local slot has
> LSN 0/3004780 and catalog xmin 744".
> 
> The xmin on standby could have been advanced due to the following Insert:
> # Insert another row on node P and wait node S to catch up
> $node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
> $node_p->wait_for_replay_catchup($node_s);
> 
> We don't wait for the xmin to catch up corresponding to this insert and I don't
> know if there is a way to do that. So, we should move this Insert to after the call
> to pg_sync_replication_slots(). It won't impact the general test of
> pg_createsubscriber.

The analysis and suggestion look reasonable to me.
Here is a small patch which does the same.


> Thanks to Hou-San for helping me in the analysis of this BF failure.

Best Regards,
Hou zj

Attachment

0001-fix-unstable-test-in-040_pg_createsubscriber.patch

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

11 July 2024, 13:34:04

On Thu, Jul 11, 2024, at 8:22 AM, Zhijie Hou (Fujitsu) wrote:

The analysis and suggestion look reasonable to me.
Here is a small patch which does the same.

LGTM. However, I would add a comment above the INSERT you moved around to remind

you that a transaction cannot be added there because it will break this test.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

Alexander Lakhin

Date:

11 July 2024, 17:00:00

Hello Amit and Hou-San,

11.07.2024 13:21, Amit Kapila wrote:
>
> We don't wait for the xmin to catch up corresponding to this insert
> and I don't know if there is a way to do that. So, we should move this
> Insert to after the call to pg_sync_replication_slots(). It won't
> impact the general test of pg_createsubscriber.
>
> Thanks to Hou-San for helping me in the analysis of this BF failure.

Thank you for investigating that issue!

May I ask you to look at another failure of the test occurred today [1]?
The failure log contains:
recovery_target_lsn = '0/30098D0'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 07:40:10.837 UTC [2948531][postmaster][:0] LOG:  received fast shutdown request

Though what I'm seeing after a successful run is:
recovery_target_lsn = '0/3009860'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 15:19:40.733 UTC [207517] 040_pg_createsubscriber.pl LOG:  statement: SELECT pg_catalog.pg_is_in_recovery()
2024-07-11 15:19:41.635 UTC [207514] LOG:  recovery stopping after WAL location (LSN) "0/3009860"
2024-07-11 15:19:41.635 UTC [207514] LOG:  redo done at 0/3009860 system usage: CPU: user: 0.00 s, system: 0.00 s, 
elapsed: 21.00 s

I've managed to reproduce the failure locally. With the bgwriter mod:
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 150

and wal_debug=on, when running 5 test instances with parallel, I get the
failure with the following log:
recovery_target_lsn = '0/3009A20'
pg_createsubscriber: starting the subscriber

2024-07-11 14:40:04.551 UTC [205589:72][startup][33/0:0] LOG:  REDO @ 0/30099E8; LSN 0/3009A20: prev 0/30099B0; xid 0;

len 24 - Standby/RUNNING_XACTS: nextXid 747 latestCompletedXid 746 oldestRunningXid 747
# ^^^ the last REDO record in the log
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
pg_createsubscriber: server was stopped
pg_createsubscriber: error: recovery timed out
...
[14:43:06.011](181.800s) not ok 30 - run pg_createsubscriber on node S
[14:43:06.012](0.000s)
[14:43:06.012](0.000s) #   Failed test 'run pg_createsubscriber on node S'
#   at t/040_pg_createsubscriber.pl line 356.

$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata/pg_wal/ 
000000010000000000000003 000000010000000000000003 | tail -2
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747

Whilst
$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_p_data/pgdata/pg_wal/ 
000000010000000000000003 000000010000000000000003 | tail
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Heap2       len (rec/tot):     60/    60, tx:        747, lsn: 0/03009A20, prev 0/030099E8, desc: NEW_CID rel: 
1663/16384/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap        len (rec/tot):     54/    54, tx:        747, lsn: 0/03009A60, prev 0/03009A20, desc: DELETE xmax: 
747, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16384/6104 blk 0
rmgr: Transaction len (rec/tot):     78/    78, tx:        747, lsn: 0/03009A98, prev 0/03009A60, desc: INVALIDATION ;

inval msgs: catcache 49 catcache 46 relcache 0
rmgr: Transaction len (rec/tot):     98/    98, tx:        747, lsn: 0/03009AE8, prev 0/03009A98, desc: COMMIT 
2024-07-11 14:43:05.994561 UTC; relcache init file inval dbid 16384 tsid 1663; inval msgs: catcache 49 catcache 46 
relcache 0
rmgr: Heap2       len (rec/tot):     60/    60, tx:        748, lsn: 0/03009B50, prev 0/03009AE8, desc: NEW_CID rel: 
1663/16385/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap        len (rec/tot):     54/    54, tx:        748, lsn: 0/03009B90, prev 0/03009B50, desc: DELETE xmax: 
748, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16385/6104 blk 0
rmgr: Transaction len (rec/tot):     78/    78, tx:        748, lsn: 0/03009BC8, prev 0/03009B90, desc: INVALIDATION ;

inval msgs: catcache 49 catcache 46 relcache 0
rmgr: Transaction len (rec/tot):     98/    98, tx:        748, lsn: 0/03009C18, prev 0/03009BC8, desc: COMMIT 
2024-07-11 14:43:06.008619 UTC; relcache init file inval dbid 16385 tsid 1663; inval msgs: catcache 49 catcache 46 
relcache 0

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-07-11%2007%3A25%3A12

Best regards,
Alexander

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

11 July 2024, 23:24:36

On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:

Hello Amit and Hou-San,

11.07.2024 13:21, Amit Kapila wrote:
>
> We don't wait for the xmin to catch up corresponding to this insert
> and I don't know if there is a way to do that. So, we should move this
> Insert to after the call to pg_sync_replication_slots(). It won't
> impact the general test of pg_createsubscriber.
>
> Thanks to Hou-San for helping me in the analysis of this BF failure.

Thank you for investigating that issue!

May I ask you to look at another failure of the test occurred today [1]?

Thanks for the report!

You are observing the same issue that Amit explained in [1]. The

pg_create_logical_replication_slot returns the EndRecPtr (see

slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points

to the next record and it is a future position for an idle server. That's why

the recovery takes some time to finish because it is waiting for an activity to

increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create

additional WAL records soon, the EndRecPtr position is reached rapidly and the

recovery ends quickly.

Hayato proposes a patch [2] to create an additional WAL record that has the same

effect from you little hack: increase the LSN position to allow the recovery

finishes soon. I don't like the solution although it seems simple to implement.

As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The

problem is that it is used by logical decoding but it is not exposed. [reading

the code...] When the logical replication slot is created, restart_lsn points to

the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last

record replayed. Since the replication slots aren't in use, we could use the

restart_lsn from the last replication slot as a consistent LSN.

I'm attaching a patch that implements it.It runs in 6s instead of 26s.

[1] https://www.postgresql.org/message-id/CAA4eK1%2Bp%2B7Ag6nqdFRdqowK1EmJ6bG-MtZQ_54dnFBi%3D_OO5RQ%40mail.gmail.com

[2] https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

0001-pg_createsubscriber-fix-slow-recovery.patch

Re: speed up a logical replica setup

From

Amit Kapila

Date:

12 July 2024, 05:40:52

On Thu, Jul 11, 2024 at 7:04 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jul 11, 2024, at 8:22 AM, Zhijie Hou (Fujitsu) wrote:
>
> The analysis and suggestion look reasonable to me.
> Here is a small patch which does the same.
>
>
> LGTM. However, I would add a comment above the INSERT you moved around to remind
> you that a transaction cannot be added there because it will break this test.
>

Pushed after adding a comment.

--
With Regards,
Amit Kapila.

Re: speed up a logical replica setup

From

Amit Kapila

Date:

15 July 2024, 06:46:57

On Fri, Jul 12, 2024 at 4:54 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:
>
> May I ask you to look at another failure of the test occurred today [1]?
>
>
> Thanks for the report!
>
> You are observing the same issue that Amit explained in [1]. The
> pg_create_logical_replication_slot returns the EndRecPtr (see
> slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points
> to the next record and it is a future position for an idle server. That's why
> the recovery takes some time to finish because it is waiting for an activity to
> increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create
> additional WAL records soon, the EndRecPtr position is reached rapidly and the
> recovery ends quickly.
>

If the recovery ends quickly (which is expected due to reduced
LOG_SNAPSHOT_INTERVAL_MS ) then why do we see "error: recovery timed
out"?

> Hayato proposes a patch [2] to create an additional WAL record that has the same
> effect from you little hack: increase the LSN position to allow the recovery
> finishes soon. I don't like the solution although it seems simple to implement.
> As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The
> problem is that it is used by logical decoding but it is not exposed. [reading
> the code...] When the logical replication slot is created, restart_lsn points to
> the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last
> record replayed.
>

The last 'lastReplayedEndRecPtr' should be the value of restart_lsn on
standby (when RecoveryInProgress is true) but here we are creating
slots on the publisher/primary, so shouldn't restart_lsn point to
"latest WAL insert pointer"?

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

17 July 2024, 07:53:22

Dear Alexander, Euler, Amit,

I also analyzed this failure, let me share it. Here, I think events in below
ordering were occurred.

1. Backend created a publication on $db2,
2. BGWriter generated RUNNING_XACT record, then
3. Backend created a replication slot on $db2.

In this case, the recovery_target_lsn is ahead of the RUNNING_XACT record generated
at step 3. Also, since both bgwriter and slot creation mark the record as
*UNIMPORTANT* one, the writer won't start again even after the
LOG_SNAPSHOT_INTERVAL_MS. The rule is written in BackgroundWriterMain():

```
            /*
             * Only log if enough time has passed and interesting records have
             * been inserted since the last snapshot.  Have to compare with <=
             * instead of < because GetLastImportantRecPtr() points at the
             * start of a record, whereas last_snapshot_lsn points just past
             * the end of the record.
             */
            if (now >= timeout &&
                last_snapshot_lsn <= GetLastImportantRecPtr())
            {
                last_snapshot_lsn = LogStandbySnapshot();
                last_snapshot_ts = now;
            }
```

Therefore, pg_createsubscriber waited until a new record was replicated, but no
activities were recorded, causing a timeout. Since this is a timing issue, Alexander
could reproduce the failure with shorter time duration and parallel running.

IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
which is not generated yet. So, I think both mine [1] and Euler's approach [2]
can solve the issue. My proposal was to add an extra WAL record after the final
slot creation, and Euler's one was to use a restart_lsn as the recovery_target_lsn.
In case of primary server, restart_lsn is set to the latest WAL insert position and
then RUNNING_XACT record is generated later.

How do you think?

[1]:
https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32@OSBPR01MB2552.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/b1f0f8c7-8f01-4950-af77-339df3dc4684%40app.fastmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Amit Kapila

Date:

17 July 2024, 10:10:08

On Wed, Jul 17, 2024 at 1:23 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I also analyzed this failure, let me share it. Here, I think events in below
> ordering were occurred.
>
> 1. Backend created a publication on $db2,
> 2. BGWriter generated RUNNING_XACT record, then
> 3. Backend created a replication slot on $db2.
>
> In this case, the recovery_target_lsn is ahead of the RUNNING_XACT record generated
> at step 3. Also, since both bgwriter and slot creation mark the record as
> *UNIMPORTANT* one, the writer won't start again even after the
> LOG_SNAPSHOT_INTERVAL_MS. The rule is written in BackgroundWriterMain():
>
> ```
>                         /*
>                          * Only log if enough time has passed and interesting records have
>                          * been inserted since the last snapshot.  Have to compare with <=
>                          * instead of < because GetLastImportantRecPtr() points at the
>                          * start of a record, whereas last_snapshot_lsn points just past
>                          * the end of the record.
>                          */
>                         if (now >= timeout &&
>                                 last_snapshot_lsn <= GetLastImportantRecPtr())
>                         {
>                                 last_snapshot_lsn = LogStandbySnapshot();
>                                 last_snapshot_ts = now;
>                         }
> ```
>
> Therefore, pg_createsubscriber waited until a new record was replicated, but no
> activities were recorded, causing a timeout. Since this is a timing issue, Alexander
> could reproduce the failure with shorter time duration and parallel running.
>

Your analysis sounds correct to me.

> IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
> which is not generated yet. So, I think both mine [1] and Euler's approach [2]
> can solve the issue. My proposal was to add an extra WAL record after the final
> slot creation, and Euler's one was to use a restart_lsn as the recovery_target_lsn.
>

I don't think it is correct to set restart_lsn as consistent_lsn point
because the same is used to set replication origin progress. Later
when we start the subscriber, the system will use that LSN as a
start_decoding_at point which is the point after which all the commits
will be replicated. So, we will end up incorrectly using restart_lsn
(LSN from where we start reading the WAL) as start_decoding_at point.
How could that be correct?

Now, even if we use restart_lsn as recovery_target_lsn and the LSN
returned by pg_create_logical_replication_slot() as consistent LSN to
set replication progress, that also could lead to data loss because
the subscriber may never get data between restart_lsn value and
consistent LSN value.

--
With Regards,
Amit Kapila.

RE: speed up a logical replica setup

From

"Hayato Kuroda (Fujitsu)"

Date:

17 July 2024, 11:58:47

Dear Amit,

> Your analysis sounds correct to me.

Okay, so we could have a same picture...

> > IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
> > which is not generated yet. So, I think both mine [1] and Euler's approach [2]
> > can solve the issue. My proposal was to add an extra WAL record after the final
> > slot creation, and Euler's one was to use a restart_lsn as the
> recovery_target_lsn.
> >
> 
> I don't think it is correct to set restart_lsn as consistent_lsn point
> because the same is used to set replication origin progress. Later
> when we start the subscriber, the system will use that LSN as a
> start_decoding_at point which is the point after which all the commits
> will be replicated. So, we will end up incorrectly using restart_lsn
> (LSN from where we start reading the WAL) as start_decoding_at point.
> How could that be correct?

I didn't say we could use restart_lsn as consistent point of logical replication,
but I could agree the approach has issues.

> Now, even if we use restart_lsn as recovery_target_lsn and the LSN
> returned by pg_create_logical_replication_slot() as consistent LSN to
> set replication progress, that also could lead to data loss because
> the subscriber may never get data between restart_lsn value and
> consistent LSN value.

You considered the case, e.g., tuples were inserted just after the restart_lsn
but before the RUNNING_XACT record? In this case, yes, the streaming replication
finishes before replicating tuples but logical replication will skip them.
Euler's approach cannot be used as-is.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: speed up a logical replica setup

From

Amit Kapila

Date:

18 July 2024, 02:37:44

On Wed, Jul 17, 2024 at 5:28 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > Your analysis sounds correct to me.
>
> Okay, so we could have a same picture...
>
> > > IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
> > > which is not generated yet. So, I think both mine [1] and Euler's approach [2]
> > > can solve the issue. My proposal was to add an extra WAL record after the final
> > > slot creation, and Euler's one was to use a restart_lsn as the
> > recovery_target_lsn.
> > >
> >
> > I don't think it is correct to set restart_lsn as consistent_lsn point
> > because the same is used to set replication origin progress. Later
> > when we start the subscriber, the system will use that LSN as a
> > start_decoding_at point which is the point after which all the commits
> > will be replicated. So, we will end up incorrectly using restart_lsn
> > (LSN from where we start reading the WAL) as start_decoding_at point.
> > How could that be correct?
>
> I didn't say we could use restart_lsn as consistent point of logical replication,
> but I could agree the approach has issues.
>
> > Now, even if we use restart_lsn as recovery_target_lsn and the LSN
> > returned by pg_create_logical_replication_slot() as consistent LSN to
> > set replication progress, that also could lead to data loss because
> > the subscriber may never get data between restart_lsn value and
> > consistent LSN value.
>
> You considered the case, e.g., tuples were inserted just after the restart_lsn
> but before the RUNNING_XACT record?

I am thinking of transactions between restart_lsn and "consistent
point lsn" (aka the point after which all commits will be replicated).
You conclusion seems correct to me that such transactions won't be
replicated by streaming replication and would be skipped by logical
replication. Now, if we can avoid that anyway, we can consider that.

--
With Regards,
Amit Kapila.

040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)

From

Robert Haas

Date:

29 July 2024, 19:34:55

On Sun, Jun 30, 2024 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In hopes of moving things along as we approach the v18 branch,
> I went ahead and pushed Kuroda-san's patches (with a bit of
> further editorialization).  AFAICS that allows closing out
> the concerns raised by Noah, so I've marked that open item
> done.  However, I added a new open item about how the
> 040_pg_createsubscriber.pl test is slow and still unstable.

This open item is now the only open item. It seems to have been open
for a month with no response from Peter which is, from my point of
view, far from ideal. However, another thing that is not ideal is that
we've been using the same thread to discuss every issue related to
this patch for 2.5 years. The thread spans hundreds of messages and it
is by no means obvious to what extent the messages posted after this
one addressed the underlying concern. Perhaps it would have been an
idea to start new threads when we started discussing post-commit
issues, instead of just tagging onto the same one.

But that said, I see no commits in the commit history which purport to
improve performance, so I guess the performance is probably still not
what you want, though I am not clear on the details. And as far as
stability is concerned, I peered through the dense haze of
027_stream_regress-related buildfarm failures for long enough to
discover that the stability issues with 040_pg_createsubscriber aren't
fixed either, because we have these recent buildfarm reports:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-07-26%2016%3A02%3A40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-07-26%2009%3A20%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake&dt=2024-07-25%2002%3A39%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-07-22%2002%3A31%3A32

So, Peter, as the committer responsible for pg_createsubscriber, what
are we going to do about this?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)

From

Tom Lane

Date:

29 July 2024, 20:18:32

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 30, 2024 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... However, I added a new open item about how the
>> 040_pg_createsubscriber.pl test is slow and still unstable.

> But that said, I see no commits in the commit history which purport to
> improve performance, so I guess the performance is probably still not
> what you want, though I am not clear on the details.

My concern is described at [1]:

>> I have a different but possibly-related complaint: why is
>> 040_pg_createsubscriber.pl so miserably slow?  On my machine it
>> runs for a bit over 19 seconds, which seems completely out of line
>> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
>> other test scripts in this directory take much less).  It looks
>> like most of the blame falls on this step:
>>
>> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
>>
>> AFAICS the amount of data being replicated is completely trivial,
>> so that it doesn't make any sense for this to take so long --- and
>> if it does, that suggests that this tool will be impossibly slow
>> for production use.  But I suspect there is a logic flaw causing
>> this.  Speculating wildly, perhaps that is related to the failure
>> Alexander spotted?

The followup discussion in that thread made it sound like there's
some fairly fundamental deficiency in how wait_for_end_recovery()
detects end-of-recovery.  I'm not too conversant with the details
though, and it's possible that pg_createsubscriber is just falling
foul of a pre-existing infelicity.

If the problem can be correctly described as "pg_createsubscriber
takes 10 seconds or so to detect end-of-stream", then it's probably
only an annoyance for testing and not something that would be fatal
in the real world.  I'm not quite sure if that's accurate, though.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

29 July 2024, 21:11:32

On Wed, Jul 17, 2024, at 11:37 PM, Amit Kapila wrote:

I am thinking of transactions between restart_lsn and "consistent
point lsn" (aka the point after which all commits will be replicated).
You conclusion seems correct to me that such transactions won't be
replicated by streaming replication and would be skipped by logical
replication. Now, if we can avoid that anyway, we can consider that.

Under reflection what I proposed [1] seems more complex and possibly

error prone than other available solutions. The recovery step was slow

if the server is idle (that's the case for the test). An idle server

usually doesn't have another WAL record after creating the replication

slots. Since pg_createsubscriber is using the LSN returned by the last

replication slot as recovery_target_lsn, this LSN is ahead of the

current WAL position and the recovery waits until something writes a

WAL record to reach the target and ends the recovery.

Hayato already mentioned one of the solution in a previous email [2].

AFAICS we can use any solution that creates a WAL record. I tested the

following options:

\timing

select * from pg_create_logical_replication_slot('pg_createsubscriber', 'pgoutput', true);

select pg_logical_emit_message(false, 'pg_createsubscriber', 'dummy');

select pg_log_standby_snapshot();

select pg_create_restore_point('pg_createsubscriber');

that results in the following output:

slot_name | lsn

---------------------+-----------

pg_createsubscriber | 0/942DD28

(1 row)

Time: 200.571 ms

pg_logical_emit_message

-------------------------

0/942DD78

(1 row)

Time: 0.938 ms

pg_log_standby_snapshot

-------------------------

0/942DDB0

(1 row)

Time: 0.741 ms

pg_create_restore_point

-------------------------

0/942DE18

(1 row)

Time: 0.870 ms

and generates the following WAL records:

rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/0942DCF0, prev 0/0942DCB8, desc: RUNNING_XACTS nextXid 3939 latestCompletedXid 3938 oldestRunningXid 3939

rmgr: LogicalMessage len (rec/tot): 75/ 75, tx: 0, lsn: 0/0942DD28, prev 0/0942DCF0, desc: MESSAGE non-transactional, prefix "pg_createsubscriber"; payload (5 bytes): 64 75 6D 6D 79

rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/0942DD78, prev 0/0942DD28, desc: RUNNING_XACTS nextXid 3939 latestCompletedXid 3938 oldestRunningXid 3939

rmgr: XLOG len (rec/tot): 98/ 98, tx: 0, lsn: 0/0942DDB0, prev 0/0942DD78, desc: RESTORE_POINT pg_createsubscriber

The options are:

(a) temporary replication slot: requires an additional replication slot.

small payload. it is extremely slow in comparison with the other

options.

(b) logical message: can be consumed by logical replication when/if it

is supported some day. big payload. fast.

(d) named restore point: biggest payload. fast.

I don't have a strong preference but if I need to pick one I would

choose option (c) or option (d). The option (a) is out of the question.

Opinions?

[1] https://www.postgresql.org/message-id/b1f0f8c7-8f01-4950-af77-339df3dc4684%40app.fastmail.com

[2] https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: speed up a logical replica setup

From

"Euler Taveira"

Date:

30 July 2024, 00:05:26

On Mon, Jul 29, 2024, at 6:11 PM, Euler Taveira wrote:

The options are:

(a) temporary replication slot: requires an additional replication slot.
small payload. it is extremely slow in comparison with the other
options.
(b) logical message: can be consumed by logical replication when/if it
is supported some day. big payload. fast.
(c) snapshot of running txn: small payload. fast.
(d) named restore point: biggest payload. fast.

I don't have a strong preference but if I need to pick one I would
choose option (c) or option (d). The option (a) is out of the question.

I'm attaching a patch that implements option (c). While reading the code

From

"Euler Taveira"

Date:

30 July 2024, 13:09:03

On Tue, Jul 30, 2024, at 7:35 AM, Peter Eisentraut wrote:

On 30.07.24 02:05, Euler Taveira wrote:
> I'm attaching a patch that implements option (c). While reading the code
> I noticed that I left a comment that should be removed by commit
> b9639138262. 0002 removes it.

I have committed your 0002 patch. It looks like Amit has already
committed your 0001 patch.

Peter / Amit, thanks for the quick review and commit.

Euler Taveira

EDB https://www.enterprisedb.com/