Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: speed up a logical replica setup
Date
Msg-id b9aa614c-84ba-a869-582f-8d5e3ab57424@enterprisedb.com
Whole thread Raw
In response to speed up a logical replica setup  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: speed up a logical replica setup  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On 21.02.22 13:09, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly 
> integrated
> with Postgres.

Are we comfortable with the name pg_subscriber?  It seems too general.
Are we planning other subscriber-related operations in the future?  If
so, we should at least make this one use a --create option or
something like that.

  doc/src/sgml/ref/pg_subscriber.sgml

Attached is a patch that reorganizes the man page a bit.  I moved the
description of the steps to the Notes section and formatted it
differently.  I think the steps are interesting but not essential for
the using of the program, so I wanted to get them out of the main
description.

  src/bin/pg_subscriber/pg_subscriber.c

+   if (made_temp_replslot)
+   {
+       conn = connect_database(dbinfo[0].pubconninfo, true);
+       drop_replication_slot(conn, &dbinfo[0], temp_replslot);
+       disconnect_database(conn);
+   }

Temp slots don't need to be cleaned up.

+/*
+ * Obtain the system identifier from control file. It will be used to 
compare
+ * if a data directory is a clone of another one. This routine is used 
locally
+ * and avoids a replication connection.
+ */
+static char *
+get_control_from_datadir(const char *datadir)

This could return uint64 directly, without string conversion.
get_sysid_from_conn() could then convert to uint64 internally.

+       {"verbose", no_argument, NULL, 'v'},

I'm not sure if the --verbose option is all that useful.

+       {"stop-subscriber", no_argument, NULL, 1},

This option doesn't seem to be otherwise supported or documented.

+   pub_sysid = pg_malloc(32);
+   pub_sysid = get_sysid_from_conn(dbinfo[0].pubconninfo);
+   sub_sysid = pg_malloc(32);
+   sub_sysid = get_control_from_datadir(subscriber_dir);

These mallocs don't appears to be of any use.

+   dbname_conninfo = pg_malloc(NAMEDATALEN);

This seems wrong.

Overall, this code could use a little bit more structuring.  There are
a lot of helper functions that don't seem to do a lot and are mostly
duplicate runs-this-SQL-command calls.  But the main() function is
still huge.  There is room for refinement.

  src/bin/pg_subscriber/t/001_basic.pl

Good start, but obviously, we'll need some real test cases here also.

  src/bin/initdb/initdb.c
  src/bin/pg_ctl/pg_ctl.c
  src/common/file_utils.c
  src/include/common/file_utils.h

I recommend skipping this refactoring.  The readfile() function from
pg_ctl is not general enough to warrant the pride of place of a
globally available function.  Note that it is specifically geared
toward some of pg_ctl's requirements, for example that the underlying
file can change while it is being read.

The requirements of pg_subscriber can be satisfied more easily: Just
call pg_ctl to start the server.  You are already using that in
pg_subscriber.  Is there a reason it can't be used here as well?
Attachment

pgsql-hackers by date:

Previous
From: Dong Wook Lee
Date:
Subject: Re: Add pg_freespacemap extension sql test
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Skipping logical replication transactions on subscriber side