Home > mailing lists

RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From	wangw.fnst@fujitsu.com
Subject	RE: Data is copied twice when specifying both child and parent table in publication
Date	September 28, 2022 08:35:33
Msg-id	OS3PR01MB6275A9B8C65C381C6828DF9D9E549@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: Data is copied twice when specifying both child and parent table in publication (Peter Smith <smithpb2250@gmail.com>)
Responses	Re: Data is copied twice when specifying both child and parent table in publication RE: Data is copied twice when specifying both child and parent table in publication
List	pgsql-hackers

Tree view

On Tues, Sep 27, 2022 at 16:45 PM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are my review comments for the HEAD_v11-0001 patch:

Thanks for your comments.

> ======
> 
> 1. General - Another related bug?
> 
> In [1] Hou-san wrote:
> 
> For another case you mentioned (via_root used when publishing child)
> CREATE PUBLICATION pub1 for TABLE parent;
> CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root);
> CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2;
> 
> The expected behavior is only the child table is published, all the changes
> should be replicated using the child table's identity. We should do table sync
> only for child tables and is same as the current behavior on HEAD. So, I think
> there is no bug in this case.
> 
> ~
> 
> That behaviour seems different to my understanding because the pgdocs
> says when the via_root param is true the 'child' table would be using
> the 'parent' identity:
> 
> [2] publish_via_partition_root determines whether changes in a
> partitioned table (or on its partitions) contained in the publication
> will be published using the identity and schema of the partitioned
> table rather than that of the individual partitions that are actually
> changed.
> 
> ~
> 
> So is this another bug (slightly different from the current one being
> patched), or is it just some different special behaviour? If it's
> another bug then you need to know that ASAP because I think you may
> want to fix both of them at the same time which might impact how this
> 2x data copy patch should be implemented.
> 
> Or perhaps just the pgdocs need more notes about special
> cases/combinations like this?
> 
> ======
> 
> 2. General - documentation?
> 
> For this current patch, IIUC it was decided that it is a bug because
> the data gets duplicated, and then some sensible rule was decided that
> this patch should use to address it (e.g. publishing a child combined
> with publishing a parent via_root will just ignore the child's
> publication...).
> 
> So my question is - is this (new/fixed) behaviour something that a
> user will be able to figure out themselves from the existing
> documentation, or does this patch now need its own special notes in
> the documentation?

IMO this behaviour doesn't look like a bug.
I think the behaviour of multiple publications with parameter
publish_via_partition_root could be added to the pg-doc later in a separate
patch.

> ======
> 
> 3. src/backend/catalog/pg_publication.c - pg_get_publication_tables
> 
> + foreach(lc, pub_elem_tables)
> + {
> + Oid *result = (Oid *) malloc(sizeof(Oid) * 2);
> +
> + result[0] = lfirst_oid(lc);
> + result[1] = pub_elem->oid;
> + table_infos = lappend(table_infos, result);
> + }
> 
> 3a.
> It looks like each element in the table_infos list is a malloced obj
> of 2x Oids (Oid of table, Oid of pub). IMO better to call this element
> 'table_info' instead of the meaningless 'result'
> 
> ~
> 
> 3b.
> Actually, I think it would be better if this function defines a little
> 2-element structure {Oid relid, Oid pubid} to use instead of this
> array (which requires knowledge that [0] means relid and [1] means
> pubid).
> 
> ~~~
> 
> 4.
> 
> + foreach(lc, table_infos)
> + {
> + Oid *table_info_tmp = (Oid *) lfirst(lc);
> +
> + if (!list_member_oid(tables, table_info_tmp[0]))
> + table_infos = foreach_delete_current(table_infos, lc);
>   }
> I think the '_tmp' suffix is not helpful here - IMO having another
> relid variable would make this more self-explanatory.
> 
> Or better yet adopt the suggestion o f #3b and have a little struct
> with self-explanatory member names.

Improved as suggested.

> =====
> 
> 5. src/backend/commands/subscriptioncmds.c - fetch_table_list
> 
> + if (server_version >= 160000)
> + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n"
> 
> Since there is an else statement block, I think this would be more
> readable if there was a statement block here too. YMMV
> 
> SUGGESTION
> if (server_version >= 160000)
> {
> appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n"
> ...
> }

Improved as suggested.

> ~~~
> 
> 6.
> 
> + /*
> + * Get the list of tables from publisher, the partition table whose
> + * ancestor is also in this list will be ignored, otherwise the initial
> + * data in the partition table would be replicated twice.
> + */
> 
> 6a.
> "from publisher, the partition" -> "from the publisher. The partition"
> 
> ~
> 
> 6b.
> This looks like a common comment that also applied to the "if" part,
> so it seems more appropriate to move it to where I indicated below.
> Perhaps the whole comment needs a bit of massaging after you move
> it...
> 
> + /*
> + * Get namespace, relname and column list (if supported) of the tables
> + * belonging to the specified publications.
> + *
> + * HERE <<<<<<<<<
> + *
> + * From version 16, the parameter of the function pg_get_publication_tables
> + * can be an array of publications. The partition table whose ancestor is
> + * also published in this publication array will be filtered out in this
> + * function.
> + */

Improved as suggested.

Also rebased the patch because the change in the HEAD (20b6847).

Attach the new patches.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

From: Dilip Kumar
Date: 28 September 2022, 08:25:35
Subject: Re: longfin and tamandua aren't too happy but I'm not sure why

From: Maxim Orlov
Date: 28 September 2022, 08:36:40
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

Attachment

Previous

Next