RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From shiy.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OSZPR01MB6310829049C935E1F8D5344FFD8F9@OSZPR01MB6310.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Jacob Champion <jchampion@timescale.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Fri, Mar 31, 2023 2:16 AM Jacob Champion <jchampion@timescale.com> wrote:
> 
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > Pushed.
> 
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
> 
>   pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
>   pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
>   pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
>   -- populate with data, then switch to subscription side
>   sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all,
> pub_other;
>   -- switch back to publication
>   pub=# ALTER PUBLICATION pub_other ADD TABLE part;
>   -- and back to subscription
>   sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>   -- data is now duplicated
> 
> (Standalone reproduction attached.)
> 
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
> 

I noticed that a similar problem has been discussed in this thread, see [1] [2]
[3] [4]. It seems complicated to fix it if we want to automatically skip tables
that have been synchronized previously by code, and this may overkill in some
cases (e.g. The target table in subscriber is not a partitioned table, and the
user want to synchronize all data in the partitioned table from the publisher).
Besides, it seems not a common case. So I'm not sure we should fix it. Maybe we
can just add some documentation for it as Peter mentioned.

[1] https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
(thesecond issue in it)
 
[3] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAA4eK1%2BNWreG%3D2sKiMz8vFzTsFhEHCjgQMyAu6zj3sdLmcheYg%40mail.gmail.com

Regards,
Shi Yu

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Minimal logical decoding on standbys
Next
From: Jeff Davis
Date:
Subject: Re: ICU locale validation / canonicalization