RE: speed up a logical replica setup - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: speed up a logical replica setup
Date
Msg-id TYCPR01MB12077EFFBA0302913CCE21C96F5562@TYCPR01MB12077.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
Dear Vignesh,

> Few comments:
> 1) The below code can lead to assertion failure if the publisher is
> stopped while dropping the replication slot:
> +       if (primary_slot_name != NULL)
> +       {
> +               conn = connect_database(dbinfo[0].pubconninfo);
> +               if (conn != NULL)
> +               {
> +                       drop_replication_slot(conn, &dbinfo[0],
> primary_slot_name);
> +               }
> +               else
> +               {
> +                       pg_log_warning("could not drop replication
> slot \"%s\" on primary",
> +                                                  primary_slot_name);
> +                       pg_log_warning_hint("Drop this replication
> slot soon to avoid retention of WAL files.");
> +               }
> +               disconnect_database(conn);
> +       }
> 
> pg_createsubscriber: error: connection to database failed: connection
> to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
> directory
> Is the server running locally and accepting connections on that socket?
> pg_createsubscriber: warning: could not drop replication slot
> "standby_1" on primary
> pg_createsubscriber: hint: Drop this replication slot soon to avoid
> retention of WAL files.
> pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
> Assertion `conn != ((void *)0)' failed.
> Aborted (core dumped)
> 
> This is happening because we are calling disconnect_database in case
> of connection failure case too which has the following assert:
> +static void
> +disconnect_database(PGconn *conn)
> +{
> +       Assert(conn != NULL);
> +
> +       PQfinish(conn);
> +}

Right. disconnect_database() was moved to if (conn != NULL) block.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +       /* Check standby server version */
> +       if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +               pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +       /* Version number has to be the first line read */
> +       if (!fgets(rawline, sizeof(rawline), ver_fd))
> +       {
> +               if (!ferror(ver_fd))
> +                       pg_fatal("unexpected empty file \"%s\"", versionfile);
> +               else
> +                       pg_fatal("could not read file \"%s\": %m", versionfile);
> +       }
> +
> +       /* Strip trailing newline and carriage return */
> +       (void) pg_strip_crlf(rawline);
> +
> +       if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +       {
> +               pg_log_error("standby server is of wrong version");
> +               pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +                                                       versionfile,
> rawline, PG_MAJORVERSION);
> +               exit(1);
> +       }
> +
> +       fclose(ver_fd);


> 3) Should this be added to typedefs.list:
> +enum WaitPMResult
> +{
> +       POSTMASTER_READY,
> +       POSTMASTER_STILL_STARTING
> +};

But the comment from Peter E. [1] was opposite. I did not handle this.

> 4) pgCreateSubscriber should be mentioned after pg_controldata to keep
> the ordering consistency:
> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
> index aa94f6adf6..c5edd244ef 100644
> --- a/doc/src/sgml/reference.sgml
> +++ b/doc/src/sgml/reference.sgml
> @@ -285,6 +285,7 @@
>     &pgCtl;
>     &pgResetwal;
>     &pgRewind;
> +   &pgCreateSubscriber;
>     &pgtestfsync;

This has been already pointed out by Peter E. I did not handle this.

> 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
> +       if (primary_slot_name)
> +       {
> +               appendPQExpBuffer(str,
> +                                                 "SELECT 1 FROM
> pg_replication_slots "
> +                                                 "WHERE active AND
> slot_name = '%s'",
> +                                                 primary_slot_name);

Fixed.

> 6) Here pg_settings should be pg_catalog.pg_settings:
> +        * - max_worker_processes >= 1 + number of dbs to be converted
> +
> *------------------------------------------------------------------------
> +        */
> +       res = PQexec(conn,
> +                                "SELECT setting FROM pg_settings
> WHERE name IN ("
> +                                "'max_logical_replication_workers', "
> +                                "'max_replication_slots', "
> +                                "'max_worker_processes', "
> +                                "'primary_slot_name') "
> +                                "ORDER BY name");

Fixed.

New version can be available in [2]

[1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]:
https://www.postgresql.org/message-id/TYCPR01MB12077CD333376B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup