Thread: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

From
Peter Smith
Date:
Hi.

(For brevity, in this mail I refer to "ALTER SUBSCRIPTION sub REFRESH
PUBLICATION" as "ASRP")

--

The PG Docs for ASRP WITH (copy_data = true), says "(Previously
subscribed tables are not copied.)" [1].

I thought this rule meant that only tables which got added by ALTER
PUBLICATION pubname ADD TABLE ... [2] after the CREATE SUBSCRIPTION
would be affected by the copy_data.

But I recently learned that when there are partitions in the
publication, then toggling the value of the PUBLICATION option
"publish_via_partition_root" [3] can also *implicitly* change the list
published tables, and therefore that too might cause any ASRP to make
use of the copy_data value for those implicitly added
partitions/tables.

It seems a bit too subtle.

I was wondering if this should be made more obvious by a note added to
the PG Docs for the ASRP [1]. e.g. "Previously subscribed tables are
not copied. Note: Tables may also be newly subscribed by changes to
the publish_via_partition_root option [link]"

Or perhaps, the "publish_via_partition_root option" Docs [3] should
say something. e.g. "Note: Changing this option can affect the ASRP
copy_data [link].

Thoughts?

-----
[1] https://www.postgresql.org/docs/devel/sql-altersubscription.html
[2] https://www.postgresql.org/docs/devel/sql-alterpublication.html
[3] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

From
Peter Smith
Date:
(One month has passed since my original post but there have been no
replies to it).

It seems like the original post maybe just got buried with too many
other mails so I am "bumping" this thread to elicit some response
for/against the suggestion.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

From
Amit Kapila
Date:
On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> But I recently learned that when there are partitions in the
> publication, then toggling the value of the PUBLICATION option
> "publish_via_partition_root" [3] can also *implicitly* change the list
> published tables, and therefore that too might cause any ASRP to make
> use of the copy_data value for those implicitly added
> partitions/tables.
>

I have tried the below example in this context but didn't see any
effect on changing via_root option.

Set up on both publisher and subscriber:
=================================
CREATE TABLE tab2 (a int PRIMARY KEY, b text) PARTITION BY LIST (a);
CREATE TABLE tab2_1 (b text, a int NOT NULL);
ALTER TABLE tab2 ATTACH PARTITION tab2_1 FOR VALUES IN (0, 1, 2, 3);
CREATE TABLE tab2_2 PARTITION OF tab2 FOR VALUES IN (5, 6);

Publisher:
==========
CREATE PUBLICATION pub_viaroot FOR TABLE tab2_2;
postgres=# INSERT INTO tab2 VALUES (1), (0), (3), (5);
INSERT 0 4
postgres=# select * from tab2_1;
 b | a
---+---
   | 1
   | 0
   | 3
(3 rows)
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)


Subscriber:
==========
CREATE SUBSCRIPTION sub_viaroot CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub_viaroot;
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)
postgres=# select * from tab2_1;
 b | a
---+---
(0 rows)

So, by this step, we can see the partition which is not subscribed is
not copied. Now, let's toggle via_root option.
Publisher
=========
Alter Publication pub_viaroot Set (publish_via_partition_root = true);

Subscriber
==========
postgres=# Alter Subscription sub_viaroot Refresh Publication;
ALTER SUBSCRIPTION
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)
postgres=# select * from tab2_1;
 b | a
---+---
(0 rows)

As per your explanation, one can expect the data in tab2_1 in the last
step. Can you explain with example?

-- 
With Regards,
Amit Kapila.



Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

From
Peter Smith
Date:
On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > But I recently learned that when there are partitions in the
> > publication, then toggling the value of the PUBLICATION option
> > "publish_via_partition_root" [3] can also *implicitly* change the list
> > published tables, and therefore that too might cause any ASRP to make
> > use of the copy_data value for those implicitly added
> > partitions/tables.
> >
>
> I have tried the below example in this context but didn't see any
> effect on changing via_root option.

Thanks for trying to reproduce. I also thought your steps were the
same as what I'd previously done but it seems like it was a bit
different. Below are my steps to observe some unexpected COPY
happening. Actually, now I am no longer sure if this is just a
documentation issue; perhaps it is a bug.

STEP 1 - create partition tables on both sides
===================================

[PUB and SUB]

postgres=# create table troot (a int) partition by range(a);
CREATE TABLE
postgres=# create table tless10 partition of troot for values from (1) to (9);
CREATE TABLE
postgres=# create table tmore10 partition of troot for values from (10) to (99);
CREATE TABLE

STEP 2 - insert some data on pub-side
==============================

[PUB]

postgres=# insert into troot values (1),(2),(3);
INSERT 0 3
postgres=# insert into troot values (11),(12),(13);
INSERT 0 3

postgres=# select * from troot;
 a
----
  1
  2
  3
 11
 12
 13
(6 rows)

STEP 3 - create a publication on the partition root
======================================

[PUB]

postgres=# CREATE PUBLICATION pub1 FOR TABLE troot;
CREATE PUBLICATION
postgres=# \dRp+ pub1;
                              Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
 postgres | f          | t       | t       | t       | t         | f
Tables:
    "public.troot"


STEP 4 - create the subscriber
=======================

[SUB]

postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=127.0.0.1
port=5432 dbname=postgres' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres=# 2021-09-15 12:45:12.224 AEST [30592] LOG:  logical
replication apply worker for subscription "sub1" has started
2021-09-15 12:45:12.236 AEST [30595] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" has
started
2021-09-15 12:45:12.247 AEST [30598] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:45:12.326 AEST [30595] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" has
finished
2021-09-15 12:45:12.332 AEST [30598] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
finished

postgres=# select * from troot;
 a
----
  1
  2
  3
 11
 12
 13
(6 rows)

// To this point, everything looks OK...

STEP 5 - toggle the publish_via_partition_root flag
======================================

[PUB]

postgres=# alter publication pub1 set (publish_via_partition_root = true);
ALTER PUBLICATION
postgres=# \dRp+ pub1;
                              Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
 postgres | f          | t       | t       | t       | t         | t
Tables:
    "public.troot"

// And then refresh the subscriber

[SUB]

postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:48:37.927 AEST [3861] LOG:  logical
replication table synchronization worker for subscription "sub1",
table "troot" has started
2021-09-15 12:48:37.977 AEST [3861] LOG:  logical replication table
synchronization worker for subscription "sub1", table "troot" has
finished

// Notice above that another tablesync worker has launched and copied
everything again - BUG??

[SUB]

postgres=# select * from troot;
 a
----
  1
  2
  3
  1
  2
  3
 11
 12
 13
 11
 12
 13
(12 rows)

// At this point if I would keep toggling the
publish_via_partition_root then each time I do subscription REFRESH
PUBLICATION it will copy the data yet again. For example,

[PUB]

postgres=# alter publication pub1 set (publish_via_partition_root = false);
ALTER PUBLICATION

[SUB]

postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:59:02.106 AEST [21709] LOG:  logical
replication table synchronization worker for subscription "sub1",
table "tless10" has started
2021-09-15 12:59:02.120 AEST [21711] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:59:02.189 AEST [21709] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" has
finished
2021-09-15 12:59:02.207 AEST [21711] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
finished

By now the pub/sub data on each side is quite different
==========================================

[PUB]

postgres=# select count(*) from troot;
 count
-------
     6
(1 row)

[SUB]

postgres=# select count(*) from troot;
 count
-------
    18
(1 row)


------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

From
Amit Kapila
Date:
On Wed, Sep 15, 2021 at 8:49 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > But I recently learned that when there are partitions in the
> > > publication, then toggling the value of the PUBLICATION option
> > > "publish_via_partition_root" [3] can also *implicitly* change the list
> > > published tables, and therefore that too might cause any ASRP to make
> > > use of the copy_data value for those implicitly added
> > > partitions/tables.
> > >
> >
> > I have tried the below example in this context but didn't see any
> > effect on changing via_root option.
>
> Thanks for trying to reproduce. I also thought your steps were the
> same as what I'd previously done but it seems like it was a bit
> different. Below are my steps to observe some unexpected COPY
> happening. Actually, now I am no longer sure if this is just a
> documentation issue; perhaps it is a bug.
>

Yeah, this looks odd to me as well.

-- 
With Regards,
Amit Kapila.