Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAHut+PtvAih2-8qKyTWbfEd4=4ia=TfsM4hzwTPQGajPKJ-1bA@mail.gmail.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
List pgsql-hackers
Hi Wang-san. I looked at the v21-0001 patch.

I don't have any new review comments -- only follow-ups for some of my
previous v20 comments that were rejected.

On Thu, Mar 23, 2023 at 8:11 PM wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> On Thu, Mar 23, 2023 at 12:27 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > Here are some review comments for patch v20-0001.
>
...
> > ======
> > src/test/subscription/t/013_partition.pl
> >
> > 5.
> > I think maybe you could have another test scenario where you INSERT
> > something into tab4_1_1 while only the publication for tab4_1 has
> > publish_via_partition_root=true
>
> I'm not sure if this scenario is necessary.
>

Please see my reply for #7 below.

> > ~~~
> >
> > 6.
> > AFAICT the tab4 tests are only testing the initial sync, but are not
> > testing normal replication. Maybe some more normal (post sync) INSERTS
> > are needed to tab4, tab4_1, tab4_1_1 ?
>
> Since I think the scenario we fixed is sync and not replication, it doesn't seem
> like we should extend the test you mentioned.
>

Maybe you are right. I only thought it would be better to have testing
which verifies that the sync phase and the normal replication phase
are using the same rules.

> > ======
> > src/test/subscription/t/028_row_filter.pl
> >
> > 7.
> > +# insert data into partitioned table.
> > +$node_publisher->safe_psql('postgres',
> > + "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)");
> > +
> >  $node_subscriber->safe_psql('postgres',
> >   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
> >  );
> > @@ -707,13 +711,17 @@ is($result, qq(t|1), 'check replicated rows to
> > tab_rowfilter_toast');
> >  # the row filter for the top-level ancestor:
> >  #
> >  # tab_rowfilter_viaroot_part filter is: (a > 15)
> > +# - INSERT (13)        NO, 13 < 15
> >  # - INSERT (14)        NO, 14 < 15
> >  # - INSERT (15)        NO, 15 = 15
> >  # - INSERT (16)        YES, 16 > 15
> > +# - INSERT (17)        YES, 17 > 15
> >  $result =
> >    $node_subscriber->safe_psql('postgres',
> > - "SELECT a FROM tab_rowfilter_viaroot_part");
> > -is($result, qq(16), 'check replicated rows to tab_rowfilter_viaroot_part');
> > + "SELECT a FROM tab_rowfilter_viaroot_part ORDER BY 1");
> > +is( $result, qq(16
> > +17),
> > + 'check replicated rows to tab_rowfilter_viaroot_part');
> >
> > ~
> >
> > I'm not 100% sure this is testing quite what you want to test. AFAICT
> > the subscription is created like:
> >
> > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
>
> I think this is the scenario we fixed : Simultaneously subscribing to two
> publications that publish the parent and child respectively, then want to use
> the parent's identity and schema).
>

Yeah, but currently BOTH the tap_pub_viaroot_2, tap_pub_viaroot_1 are
using "WITH (publish_via_partition_root)", so IMO the user would
surely expect that only the root table would be published even when a
subscription combines those publications. OTOH, I thought a subtle
point of this patch is that now the same result will happen even if
only ONE of the publications was using "WITH
(publish_via_partition_root)". So it’s that scenario of “only ONE
publication is using the option” that I thought ought to be explicitly
tested.

This was the same also reason for my comment #5 above.

> > ======
> > src/test/subscription/t/031_column_list.pl
> >
> > 8.
> > - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
> > (publish_via_partition_root = true);
> > + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
> > (publish_via_partition_root = true);
> > + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> > (publish_via_partition_root = true);
> >
> >   -- initial data
> >   INSERT INTO test_root VALUES (1, 2, 3);
> >   INSERT INTO test_root VALUES (10, 20, 30);
> >  ));
> >
> > +# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which
> > +# means that the initial data will be synced once, and only the column list of
> > +# the parent table (test_root) in the publication pub_root_true_1 will be used
> > +# for both table sync and data replication.
> >  $node_subscriber->safe_psql(
> >   'postgres', qq(
> > - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> > pub_root_true;
> > + CREATE
> >
> > ~
> >
> > (This is similar to the previous review comment #7 above)
> >
> > Won't it be a better test of the "At least one" code when only the
> > publication of partition (test_root_1) is using "WITH
> > (publish_via_partition_root = true)".
> >
> > e.g
> > CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a);
> > CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> > (publish_via_partition_root = true);
>
> I think specifying one or both is the same scenario here.
> But it seemed clearer if only the "via_root" option is specified in the
> publication that publishes the parent, so I changed this point in
> "031_column_list.pl". Since the publications in "028_row_filter.pl" were
> introduced by other commits, I didn't change it.
>

In hindsight, I think those publications should be renamed to
something more appropriate. The name "pub_root_true_2" seems
misleading now since the publish_via_partition_root = false

e.g.1. pub_test_root, pub_test_root_1
or
e.g.2. pub_root_true, pub_root_1_false
etc.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Commitfest 2023-03 starting tomorrow!
Next
From: Justin Pryzby
Date:
Subject: Re: Improve logging when using Huge Pages