RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | wangw.fnst@fujitsu.com |
---|---|
Subject | RE: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | OS3PR01MB6275A9B8C65C381C6828DF9D9E549@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Data is copied twice when specifying both child and parent table in publication (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Data is copied twice when specifying both child and parent table in publication
RE: Data is copied twice when specifying both child and parent table in publication |
List | pgsql-hackers |
On Tues, Sep 27, 2022 at 16:45 PM Peter Smith <smithpb2250@gmail.com> wrote: > Here are my review comments for the HEAD_v11-0001 patch: Thanks for your comments. > ====== > > 1. General - Another related bug? > > In [1] Hou-san wrote: > > For another case you mentioned (via_root used when publishing child) > CREATE PUBLICATION pub1 for TABLE parent; > CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root); > CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2; > > The expected behavior is only the child table is published, all the changes > should be replicated using the child table's identity. We should do table sync > only for child tables and is same as the current behavior on HEAD. So, I think > there is no bug in this case. > > ~ > > That behaviour seems different to my understanding because the pgdocs > says when the via_root param is true the 'child' table would be using > the 'parent' identity: > > [2] publish_via_partition_root determines whether changes in a > partitioned table (or on its partitions) contained in the publication > will be published using the identity and schema of the partitioned > table rather than that of the individual partitions that are actually > changed. > > ~ > > So is this another bug (slightly different from the current one being > patched), or is it just some different special behaviour? If it's > another bug then you need to know that ASAP because I think you may > want to fix both of them at the same time which might impact how this > 2x data copy patch should be implemented. > > Or perhaps just the pgdocs need more notes about special > cases/combinations like this? > > ====== > > 2. General - documentation? > > For this current patch, IIUC it was decided that it is a bug because > the data gets duplicated, and then some sensible rule was decided that > this patch should use to address it (e.g. publishing a child combined > with publishing a parent via_root will just ignore the child's > publication...). > > So my question is - is this (new/fixed) behaviour something that a > user will be able to figure out themselves from the existing > documentation, or does this patch now need its own special notes in > the documentation? IMO this behaviour doesn't look like a bug. I think the behaviour of multiple publications with parameter publish_via_partition_root could be added to the pg-doc later in a separate patch. > ====== > > 3. src/backend/catalog/pg_publication.c - pg_get_publication_tables > > + foreach(lc, pub_elem_tables) > + { > + Oid *result = (Oid *) malloc(sizeof(Oid) * 2); > + > + result[0] = lfirst_oid(lc); > + result[1] = pub_elem->oid; > + table_infos = lappend(table_infos, result); > + } > > 3a. > It looks like each element in the table_infos list is a malloced obj > of 2x Oids (Oid of table, Oid of pub). IMO better to call this element > 'table_info' instead of the meaningless 'result' > > ~ > > 3b. > Actually, I think it would be better if this function defines a little > 2-element structure {Oid relid, Oid pubid} to use instead of this > array (which requires knowledge that [0] means relid and [1] means > pubid). > > ~~~ > > 4. > > + foreach(lc, table_infos) > + { > + Oid *table_info_tmp = (Oid *) lfirst(lc); > + > + if (!list_member_oid(tables, table_info_tmp[0])) > + table_infos = foreach_delete_current(table_infos, lc); > } > I think the '_tmp' suffix is not helpful here - IMO having another > relid variable would make this more self-explanatory. > > Or better yet adopt the suggestion o f #3b and have a little struct > with self-explanatory member names. Improved as suggested. > ===== > > 5. src/backend/commands/subscriptioncmds.c - fetch_table_list > > + if (server_version >= 160000) > + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" > > Since there is an else statement block, I think this would be more > readable if there was a statement block here too. YMMV > > SUGGESTION > if (server_version >= 160000) > { > appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" > ... > } Improved as suggested. > ~~~ > > 6. > > + /* > + * Get the list of tables from publisher, the partition table whose > + * ancestor is also in this list will be ignored, otherwise the initial > + * data in the partition table would be replicated twice. > + */ > > 6a. > "from publisher, the partition" -> "from the publisher. The partition" > > ~ > > 6b. > This looks like a common comment that also applied to the "if" part, > so it seems more appropriate to move it to where I indicated below. > Perhaps the whole comment needs a bit of massaging after you move > it... > > + /* > + * Get namespace, relname and column list (if supported) of the tables > + * belonging to the specified publications. > + * > + * HERE <<<<<<<<< > + * > + * From version 16, the parameter of the function pg_get_publication_tables > + * can be an array of publications. The partition table whose ancestor is > + * also published in this publication array will be filtered out in this > + * function. > + */ Improved as suggested. Also rebased the patch because the change in the HEAD (20b6847). Attach the new patches. Regards, Wang wei
Attachment
pgsql-hackers by date: