Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Thu, Oct 28, 2021 at 4:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> As there are basically two separate issues mentioned in the thread, I tried to
> summarize the discussion so far which might be helpful to others.
>
> * The first issue[1]:
>
> If we include both the partitioned table and (explicitly) its child partitions
> in the publication when set publish_via_partition_root=true, like:
> ---
> CREATE PUBLICATION pub FOR TABLE parent_table, child_table with (publish_via_partition_root=on);
> ---
> It could execute initial sync for both the partitioned(parent_table) table and
> (explicitly) its child partitions(child_table) which cause duplication of
> data in partition(child_table) in subscriber side.
>
> The reasons I considered this behavior a bug are:
>
> a) In this case, the behavior of initial sync is inconsistent with the behavior
> of transaction streaming. All changes in the leaf the partition will be applied
> using the identity and schema of the partitioned(root) table. But for the
> initial sync, it will execute sync for both the partitioned(root) table and
> (explicitly) its child partitions which cause duplication of data.
>
> b) The behavior of FOR TABLE is inconsistent with the behavior of FOR ALL TABLE.
> If user create a FOR ALL TABLE publication and set publish_via_partition_root=true,
> then only the top most partitioned(root) table will execute initial sync.
>
> IIRC, most people in this thread agreed that the current behavior is not
> expected. So, maybe it's time to try to fix it.
>
> Attach my fix patch here. The patch try to fix this issue by making the
> pg_publication_tables view only show partitioned table when
> publish_via_partition_root is true.
>
>
> * The second issue[2]:
> -----
> CREATE TABLE sale (sale_date date not null,country_code text, product_sku text,
> units integer) PARTITION BY RANGE (sale_date);
> CREATE TABLE sale_201901 PARTITION OF sale FOR VALUES FROM ('2019-01-01') TO
> ('2019-02-01');
> CREATE TABLE sale_201902 PARTITION OF sale FOR VALUES FROM ('2019-02-01') TO
> ('2019-03-01');
>
> (1) PUB:  CREATE PUBLICATION pub FOR TABLE sale_201901,
> sale_201902 WITH (publish_via_partition_root=true);
> (2) SUB:  CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost port=5432' PUBLICATION pub;
> (3) PUB:  INSERT INTO sale VALUES('2019-01-01', 'AU', 'cpu', 5), ('2019-01-02', 'AU', 'disk', 8);
> (4) SUB:  SELECT * FROM sale;
> (5) PUB:  ALTER PUBLICATION pub ADD TABLE sale;
> (6) SUB:  ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
> (7) SUB:  SELECT * FROM sale;
> -----
>
> In step (7), we can see duplication of data.
>
> The reason is that the INSERTed data is first published though the partitions,
> since initially there is no partitioned table in the publication (so
> publish_via_partition_root=true doesn't have any effect). But then adding the
> partitioned table to the publication and refreshing the publication in the
> subscriber, the data is then published "using the identity and schema of the
> partitioned table" due to publish_via_partition_root=true.
> (Copied from Greg's analysis).
>
> Whether this behavior is correct is still under debate.
>
>
> Overall, I think the second issue still needs further discussion while the
> first issue seems clear that most people think it's unexpected. So, I think it
> might be better to fix the first issue.

Thanks for the summary, Hou-san, and sorry about my late reply.

I had thought about this some last week and I am coming around to
recognizing the confusing user experience of the current behavior.
Though, I am not sure if removing partitions from the result of
pg_publication_tables view for pubviaroot publications is acceptable
as a fix, because that prevents replicating into a subscriber node
where tables that are partition root and a partition respectively on
the publisher are independent tables on the subscriber.  ISTM that
what Amit K mentioned in his first reply may be closer to what we may
ultimately need to do, which is this:

"I think one possible idea to investigate is that on the
subscriber-side, after fetching tables, we check the already
subscribed tables and if the child tables already exist then we ignore
the parent table and vice versa."

I had also thought about a way to implement that a bit and part of
that is to make pg_publication_tables always expose leaf partitions,
that is, even if pubviaroot is true for a given publication.  So, when
puviaroot is true we include the partition root actually mentioned in
the publication as we do currently (changes streamed after initial
sync will always use its schema so the subscriber better know about
it), and also leaf partitions which we currently don't.  The latter
too so that a subscriber can determine which leaf partitions to sync
and which ones to not based on whether a given leaf partitions is
already known to the subscriber, which means their pg_subscription_rel
entries would need to be made which we don't currently.  Leaf
partition state entries will not be referred to after all of the known
ones are synced, because the subsequent changes will be streamed with
the root's schema.  The initial sync worker would need to be taught to
skip processing any partitioned tables that it sees, because now we'd
be handling the initial sync part via individual leaf partitions.

Though the thing that makes this a bit tricky to implement is that the
pg_publication_tables view exposes way less information to be able to
determine how the tables listed are related to each other.  So I think
we'd need to fix the view to return leaf partitions' owning root table
(if they're published through a pubviaroot publication) or rewrite the
query that fetch_table_list() uses to access the underlying catalogs
directly in the back-ported version.

That is roughly how I'd go about this.  Thoughts?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: inefficient loop in StandbyReleaseLockList()
Next
From: Gilles Darold
Date:
Subject: Re: [PATCH] Proposal for HIDDEN/INVISIBLE column