Thread: Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
vignesh C
Date:
On Mon, 9 Dec 2024 at 16:25, Shubham Khanna <khannashubham1197@gmail.com> wrote:
>
> Hi all,
>
> I am writing to propose the addition of the two_phase option in
> pg_createsubscriber. As discussed in [1], supporting this feature
> during the development of pg_createsubscriber was planned for this
> version.

Yes this will be useful.

> Currently, pg_createsubscriber creates subscriptions with the
> two_phase option set to false. Enabling the two_phase option after a
> subscription has been created is not straightforward, as it requires
> the subscription to be disabled first. This patch aims to address this
> limitation by incorporating the two_phase option into
> pg_createsubscriber which will help create subscription with two_phase
> option and make use of the advantages of creating subscription with
> two_phase option.
> The attached patch has the changes for the same.

Few comments:
1) You can keep it with no argument similar to how dry-run is handled:
@@ -1872,6 +1881,7 @@ main(int argc, char **argv)
                {"publisher-server", required_argument, NULL, 'P'},
                {"socketdir", required_argument, NULL, 's'},
                {"recovery-timeout", required_argument, NULL, 't'},
+               {"two_phase", optional_argument, NULL, 'T'},
                {"subscriber-username", required_argument, NULL, 'U'},
                {"verbose", no_argument, NULL, 'v'},
                {"version", no_argument, NULL, 'V'},

2) After making it to no argument option, if this option is set, just
set the value, strcmp's are not required:
+                       case 'T':
+                               if (optarg != NULL)
+                               {
+                                       if (strcmp(optarg, "on") == 0)
+                                               opt.two_phase = true;
+                                       else if (strcmp(optarg, "off") == 0)
+                                               opt.two_phase = false;
+                                       else
+                                               pg_fatal("invalid
value for --two-phase: must be 'on' or 'off'");
+                               }
+                               else
+                                       opt.two_phase = true;   /*
Default to true if no argument
+
                  * is provided */
+                               break;

3) You can add a link to create subscription documentation page of
two_phase option here:
+       Enables or disables two-phase commit for the subscription.
+       When the option is provided without a value, it defaults to
+       <literal>on</literal>. Specify <literal>on</literal> to enable or
+       <literal>off</literal> to disable.
+       Two-phase commit ensures atomicity in logical replication for prepared
+       transactions. By default, this option is enabled unless explicitly set
+       to <literal>off</literal>.

4) Instead of adding new tests, can we include the prepare transaction
and prepare transaction verification to the existing tests itself?
+# Set up node A as primary
+my $node_a = PostgreSQL::Test::Cluster->new('node_a');
+my $aconnstr = $node_a->connstr;
+$node_a->init(allows_streaming => 'logical');
+$node_a->append_conf(
+       'postgresql.conf', qq[
+       autovacuum = off
+       wal_level = logical
+       max_wal_senders = 10
+       max_worker_processes = 8
+       max_prepared_transactions = 10
+]);
+
+$node_a->start;

Regards,
Vignesh



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
vignesh C
Date:
On Tue, 10 Dec 2024 at 17:17, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 7:43 PM vignesh C <vignesh21@gmail.com> wrote:
>
> The attached Patch contains the required changes.

Few comments:
1) This is not correct, currently enabling two_phase option using
alter subscription is supported:
Notably, the replication for prepared transactions functions regardless of the
initial two-phase setting on the replication slot. However, the user cannot
change the setting after the subscription is created unless a future command,
such as 'ALTER SUBSCRIPTION ... SET (two_phase = on)', is supported.
This provides flexibility for future enhancements.

2) You can enable-two-phase on a non dry-run mode test case, as the
verification will not be possible in dry-run mode:
 # pg_createsubscriber can run without --databases option
@@ -352,7 +355,7 @@ command_ok(
                $node_p->connstr($db1), '--socketdir',
                $node_s->host, '--subscriber-port',
                $node_s->port, '--replication-slot',
-               'replslot1'
+               'replslot1', '--enable-two-phase'

3) I felt this line can be removed "Two-phase commit ensures atomicity
in logical replication for prepared transactions." as this information
is available at prepare transaction and create subscription page
documentation:
+       <literal>off</literal>.
+       Two-phase commit ensures atomicity in logical replication for prepared
+       transactions. See the
+       <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+       documentation for more details.

4) This change is not required as it is not needed for the patch:
                dbinfo[i].made_replslot = false;
                dbinfo[i].made_publication = false;
+
                /* Fill subscriber attributes */
                conninfo = concat_conninfo_dbname(sub_base_conninfo, cell->val);

5) Similarly here too:
@@ -341,6 +343,7 @@ command_ok(
 $node_s->start;
 is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
        't', 'standby is in recovery');
+
 $node_s->stop;

Regards,
Vignesh



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Shubham Khanna
Date:
On Wed, Dec 11, 2024 at 10:59 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 10 Dec 2024 at 17:17, Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Mon, Dec 9, 2024 at 7:43 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > The attached Patch contains the required changes.
>
> Few comments:
> 1) This is not correct, currently enabling two_phase option using
> alter subscription is supported:
> Notably, the replication for prepared transactions functions regardless of the
> initial two-phase setting on the replication slot. However, the user cannot
> change the setting after the subscription is created unless a future command,
> such as 'ALTER SUBSCRIPTION ... SET (two_phase = on)', is supported.
> This provides flexibility for future enhancements.
>
> 2) You can enable-two-phase on a non dry-run mode test case, as the
> verification will not be possible in dry-run mode:
>  # pg_createsubscriber can run without --databases option
> @@ -352,7 +355,7 @@ command_ok(
>                 $node_p->connstr($db1), '--socketdir',
>                 $node_s->host, '--subscriber-port',
>                 $node_s->port, '--replication-slot',
> -               'replslot1'
> +               'replslot1', '--enable-two-phase'
>
> 3) I felt this line can be removed "Two-phase commit ensures atomicity
> in logical replication for prepared transactions." as this information
> is available at prepare transaction and create subscription page
> documentation:
> +       <literal>off</literal>.
> +       Two-phase commit ensures atomicity in logical replication for prepared
> +       transactions. See the
> +       <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
> +       documentation for more details.
>
> 4) This change is not required as it is not needed for the patch:
>                 dbinfo[i].made_replslot = false;
>                 dbinfo[i].made_publication = false;
> +
>                 /* Fill subscriber attributes */
>                 conninfo = concat_conninfo_dbname(sub_base_conninfo, cell->val);
>
> 5) Similarly here too:
> @@ -341,6 +343,7 @@ command_ok(
>  $node_s->start;
>  is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
>         't', 'standby is in recovery');
> +
>  $node_s->stop;
>

I have fixed the given comments. The v3 version patch attached at [1]
has the changes for the same.
[1] - https://www.postgresql.org/message-id/CAHv8RjJ8NuHLQUhkqE-fy5k%2B3nZUdX9QngrLaa7iS0TEJNicow%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Peter Smith
Date:
Hi Shubham,

Here are some review comments for the patch v4-0001.

======
GENERAL.

1.
After reading Vignesh's last review and then the pg_createsubscriber
documentation I see there can be multiple databases simultaneously
specified (by writing multiple -d switches) and in that case each one
gets its own:
--publication
--replication-slot
--subscription

OTOH, this new '--enable-two-phase' switch is just a blanket two_phase
enablement across all subscriptions. There is no way for the user to
say if they want it enabled for some subscriptions (on some databases)
but not for others. I suppose this was intentional and OK (right?),
but this point needs to be clarified in the docs.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2.
+      <para>
+       Enables <link
linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+       commit for the subscription. The default is <literal>false</literal>.
+      </para>

Following on from the previous review comment. Since this switch is
applied to all database subscriptions I think the help needs to say
that. Something like.

"If there are multiple subscriptions specified, this option applies to
all of them."

~~~

3.
In the "Prerequisites" sections of the docs, it gives requirements for
various GUC settings on the source server and the target server. So,
should there be another note here advising to configure the
'max_prepared_transactions' GUC when the '--enable-two-phase' is
specified?

~~~

4. "Warnings" section includes the following:

pg_createsubscriber sets up logical replication with two-phase commit
disabled. This means that any prepared transactions will be replicated
at the time of COMMIT PREPARED, without advance preparation. Once
setup is complete, you can manually drop and re-create the
subscription(s) with the two_phase option enabled.

~

The above text is from the "Warnings" section, but it seems stale
information that needs to be updated due to the introduction of this
new '--enable-two-phase' option.

======
src/bin/pg_basebackup/pg_createsubscriber.c

usage:
5.
  printf(_("  -t, --recovery-timeout=SECS     seconds to wait for
recovery to end\n"));
+ printf(_("  -T, --enable-two-phase          enable two-phase commit
for the subscription\n"));

Given the previous review comments (#1/#2 etc), I was wondering if it
might be better to say more like "enable two-phase commit for all
subscriptions".

======
.../t/040_pg_createsubscriber.pl

6.
+is($count_prepared_s, qq(1), 'Prepared transaction replicated to subscriber');

Should there also have been an earlier check (way back before the
PREPARE) just to make sure this count was initially 0?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham,

> Thank you for pointing this out and for suggesting the changes. I
> agree with your approach.
> Also, I found a mistake in getopt_long and fixed it in this version of
> the patch.
> The attached patch contains the suggested changes.

Thanks for updating the patch. I think the patch looks mostly OK.

Regarding the test code - I think we should directly refer the pg_subscription catalog,
and confirm that subtwophase is 'p'. IIUC, we can wait until
all subscriptions. Subtwophasestate are 'e', by using poll_query_until() and [1].

[1]: SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
vignesh C
Date:
On Thu, 12 Dec 2024 at 08:14, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shubham,
>
> > Thank you for pointing this out and for suggesting the changes. I
> > agree with your approach.
> > Also, I found a mistake in getopt_long and fixed it in this version of
> > the patch.
> > The attached patch contains the suggested changes.
>
> Thanks for updating the patch. I think the patch looks mostly OK.
>
> Regarding the test code - I think we should directly refer the pg_subscription catalog,
> and confirm that subtwophase is 'p'. IIUC, we can wait until
> all subscriptions. Subtwophasestate are 'e', by using poll_query_until() and [1].
>
> [1]: SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');

Yes, that approach is better. Also this is not required after checking
using pg_subscription:
+# Prepare a transaction on the publisher
+$node_p->safe_psql(
+ $db1, qq[
+        BEGIN;
+        INSERT INTO tbl1 SELECT generate_series(1, 10);
+        PREPARE TRANSACTION 'test_prepare';
+]);
+
 # Start subscriber
 $node_s->start;

+# Verify that the prepared transaction is replicated to the subscriber
+my $count_prepared_s =
+  $node_s->safe_psql($db1, "SELECT count(*) FROM pg_prepared_xacts;");
+
+is($count_prepared_s, qq(1), 'Prepared transaction replicated to subscriber');

Regards,
Vignesh



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Shubham Khanna
Date:
On Thu, Dec 12, 2024 at 8:14 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shubham,
>
> > Thank you for pointing this out and for suggesting the changes. I
> > agree with your approach.
> > Also, I found a mistake in getopt_long and fixed it in this version of
> > the patch.
> > The attached patch contains the suggested changes.
>
> Thanks for updating the patch. I think the patch looks mostly OK.
>
> Regarding the test code - I think we should directly refer the pg_subscription catalog,
> and confirm that subtwophase is 'p'. IIUC, we can wait until
> all subscriptions. Subtwophasestate are 'e', by using poll_query_until() and [1].
>
> [1]: SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');
>

I have fixed the given comment. The v5 version patch attached at [1]
has the changes for the same.
[1] - https://www.postgresql.org/message-id/CAHv8Rj%2Bhd2MTNRs4AsK6%3DhRqvV6JC9g2tTAJwGjrNfXg6vhD8g%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Shubham Khanna
Date:
On Thu, Dec 12, 2024 at 9:34 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 12 Dec 2024 at 08:14, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Shubham,
> >
> > > Thank you for pointing this out and for suggesting the changes. I
> > > agree with your approach.
> > > Also, I found a mistake in getopt_long and fixed it in this version of
> > > the patch.
> > > The attached patch contains the suggested changes.
> >
> > Thanks for updating the patch. I think the patch looks mostly OK.
> >
> > Regarding the test code - I think we should directly refer the pg_subscription catalog,
> > and confirm that subtwophase is 'p'. IIUC, we can wait until
> > all subscriptions. Subtwophasestate are 'e', by using poll_query_until() and [1].
> >
> > [1]: SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');
>
> Yes, that approach is better. Also this is not required after checking
> using pg_subscription:
> +# Prepare a transaction on the publisher
> +$node_p->safe_psql(
> + $db1, qq[
> +        BEGIN;
> +        INSERT INTO tbl1 SELECT generate_series(1, 10);
> +        PREPARE TRANSACTION 'test_prepare';
> +]);
> +
>  # Start subscriber
>  $node_s->start;
>
> +# Verify that the prepared transaction is replicated to the subscriber
> +my $count_prepared_s =
> +  $node_s->safe_psql($db1, "SELECT count(*) FROM pg_prepared_xacts;");
> +
> +is($count_prepared_s, qq(1), 'Prepared transaction replicated to subscriber');
>

I have fixed the given comment. The v5 version patch attached at [1]
has the changes for the same.
[1] - https://www.postgresql.org/message-id/CAHv8Rj%2Bhd2MTNRs4AsK6%3DhRqvV6JC9g2tTAJwGjrNfXg6vhD8g%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Peter Smith
Date:
Hi Shubham,

Here are my review comments for the patch v5-0001.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

1.
-    must accept local connections.
+    must accept local connections. If you are planning to use the
+    --enable-two-phase switch then you will also need to set the
+    <xref linkend="guc-max-prepared-transactions"/> appropriately.

Should use sgml <option> markup for "--enable-two-phase".

~~~

2.
-    <application>pg_createsubscriber</application> sets up logical
-    replication with two-phase commit disabled.  This means that any
-    prepared transactions will be replicated at the time
-    of <command>COMMIT PREPARED</command>, without advance preparation.
-    Once setup is complete, you can manually drop and re-create the
-    subscription(s) with
+    If --enable-two-phase switch is not specified, the
+    <application>pg_createsubscriber</application> sets up logical replication
+    with two-phase commit disabled.  This means that any prepared transactions
+    will be replicated at the time of <command>COMMIT PREPARED</command>,
+    without advance preparation. Once setup is complete, you can manually drop
+    and re-create the subscription(s) with

Should use sgml <option> markup for "--enable-two-phase".

======
.../t/040_pg_createsubscriber.pl

3.
+# Verify that the subtwophase is 'p' in the pg_subscription catalog
+my $poll_query_until = $node_s->safe_psql('postgres',
+ "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT
IN ('e');"
+);
+
+is($poll_query_until, qq(t),
+ 'Timed out while waiting for subscriber to enable twophase');

3a.
Hmm. Does this code match the comment? The comment says verify
subtwophase is 'p' (aka "pending") but AFAICT the code is actually
waiting until every subtwophase is 'e' (aka "enabled").

~

3b.
Also, if you are going to name char-codes (like 'p') in comments, it
might be helpful to include the equivalent words, saving readers from
having to search the documentation to find the meaning.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Peter Smith
Date:
Hi Shubham.

Here are my review comments for v6-0001.

======
1.
+# Verify that the subtwophase is enabled ('e') in the pg_subscription catalog
+$node_s->poll_query_until('postgres',
+ "SELECT count(1) = 2 FROM pg_subscription WHERE subtwophasestate = 'e';")
+  or die "Timed out while waiting for subscriber to enable twophase";
+

This form of the SQL is probably OK but it's a bit tricky; Probably it
should have been explained in the comment about where that count "2"
has come from.

~~

I think it was OK as before (v5) to be querying until nothing was NOT
'e'. In other words, until everything was enabled 'e'.
SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');

~~

OTOH, to save execution time we really would be satisfied with both
'p' and 'e' states here. (we don't strictly need to wait for the
transition from 'p' to 'e' to occur).

So, SQL like the one below might be the best:

# Verify that all subtwophase states are pending or enabled,
# e.g. there are no subscriptions where subtwophase is disabled ('d').
SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate IN ('d')

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
vignesh C
Date:
On Fri, 13 Dec 2024 at 13:01, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Fri, Dec 13, 2024 at 12:20 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Shubham.
> >
> > Here are my review comments for v6-0001.
> >
> > ======
> > 1.
> > +# Verify that the subtwophase is enabled ('e') in the pg_subscription catalog
> > +$node_s->poll_query_until('postgres',
> > + "SELECT count(1) = 2 FROM pg_subscription WHERE subtwophasestate = 'e';")
> > +  or die "Timed out while waiting for subscriber to enable twophase";
> > +
> >
> > This form of the SQL is probably OK but it's a bit tricky; Probably it
> > should have been explained in the comment about where that count "2"
> > has come from.
> >
> > ~~
> >
> > I think it was OK as before (v5) to be querying until nothing was NOT
> > 'e'. In other words, until everything was enabled 'e'.
> > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');
> >
> > ~~
> >
> > OTOH, to save execution time we really would be satisfied with both
> > 'p' and 'e' states here. (we don't strictly need to wait for the
> > transition from 'p' to 'e' to occur).
> >
> > So, SQL like the one below might be the best:
> >
> > # Verify that all subtwophase states are pending or enabled,
> > # e.g. there are no subscriptions where subtwophase is disabled ('d').
> > SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate IN ('d')
> >
>
> I have fixed the given comment. Since prepared transactions are not
> being used anymore, I have removed it from the test file.
> The attached patch contains the suggested changes.

Few comments:
1) Since we are providing --enable-two-phase option now and user can
create subscriptions with two-phase option this warning can be removed
now:
    <para>
-    <application>pg_createsubscriber</application> sets up logical
-    replication with two-phase commit disabled.  This means that any
-    prepared transactions will be replicated at the time
-    of <command>COMMIT PREPARED</command>, without advance preparation.
-    Once setup is complete, you can manually drop and re-create the
-    subscription(s) with
+    If <option>--enable-two-phase</option> is not specified, the
+    <application>pg_createsubscriber</application> sets up logical replication
+    with two-phase commit disabled.  This means that any prepared transactions
+    will be replicated at the time of <command>COMMIT PREPARED</command>,
+    without advance preparation. Once setup is complete, you can manually drop
+    and re-create the subscription(s) with
     the <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
-    option enabled.

2) Since we are not going to wait till the subscriptions are enabled,
we can use safe_psql instead of poll_query_until, something like:
is($node_s->safe_psql('postgres',
"SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate = 'd'"),
't', 'subscriptions are created with the two-phase option enabled');

instead of:
+$node_s->poll_query_until('postgres',
+       "SELECT count(1) = 0 FROM pg_subscription WHERE
subtwophasestate IN ('d');"
+);

3) This can be removed from the commit message:
Documentation has been updated to reflect the new option, and test cases have
been added to validate various scenarios, including proper validation of the
'--enable-two-phase' option and its combinations with other options.

4) Should" two-phase option" be "enable-two-phase option" here:
        const char *sub_username;       /* subscriber username */
+       bool            two_phase;              /* two-phase option */
        SimpleStringList database_names;        /* list of database names */

Regards,
Vignesh



Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

From
Peter Smith
Date:
Hi Shubham.

The patch v8-0001 looks good to me.

FYI, there are a few pending patches [1][2][3] that might have some
tests/docs overlap with this one, so be on the lookout because if
those get pushed first your patch may require rebasing.

======
[1]
https://www.postgresql.org/message-id/flat/CAHut%2BPsVMbDEkbGZxpb1vVPB8uKmhFgD3ow6PXoSs5ktzBXLbQ%40mail.gmail.com#df7b92b25077554439d9bb931103dd8a
[2]
https://www.postgresql.org/message-id/flat/CAHut%2BPtnA4DB_pcv4TDr4NjUSM1%3DP2N_cuZx5DX09k7LVmaqUA%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/flat/CAHut%2BPv%2B96CykSY6-uLKZWaa6to9x1DurmyJh8Jmu1_1P7hp4Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia