Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | CAHut+PtTN1Udug3x1eZnDdh8Z6PL5VN=xz1b44VbB9T1z_PtcQ@mail.gmail.com Whole thread Raw |
In response to | RE: Data is copied twice when specifying both child and parent table in publication ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>) |
Responses |
RE: Data is copied twice when specifying both child and parent table in publication
|
List | pgsql-hackers |
Here are my review comments for the HEAD_v11-0001 patch: ====== 1. General - Another related bug? In [1] Hou-san wrote: For another case you mentioned (via_root used when publishing child) CREATE PUBLICATION pub1 for TABLE parent; CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root); CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2; The expected behavior is only the child table is published, all the changes should be replicated using the child table's identity. We should do table sync only for child tables and is same as the current behavior on HEAD. So, I think there is no bug in this case. ~ That behaviour seems different to my understanding because the pgdocs says when the via_root param is true the 'child' table would be using the 'parent' identity: [2] publish_via_partition_root determines whether changes in a partitioned table (or on its partitions) contained in the publication will be published using the identity and schema of the partitioned table rather than that of the individual partitions that are actually changed. ~ So is this another bug (slightly different from the current one being patched), or is it just some different special behaviour? If it's another bug then you need to know that ASAP because I think you may want to fix both of them at the same time which might impact how this 2x data copy patch should be implemented. Or perhaps just the pgdocs need more notes about special cases/combinations like this? ====== 2. General - documentation? For this current patch, IIUC it was decided that it is a bug because the data gets duplicated, and then some sensible rule was decided that this patch should use to address it (e.g. publishing a child combined with publishing a parent via_root will just ignore the child's publication...). So my question is - is this (new/fixed) behaviour something that a user will be able to figure out themselves from the existing documentation, or does this patch now need its own special notes in the documentation? ====== 3. src/backend/catalog/pg_publication.c - pg_get_publication_tables + foreach(lc, pub_elem_tables) + { + Oid *result = (Oid *) malloc(sizeof(Oid) * 2); + + result[0] = lfirst_oid(lc); + result[1] = pub_elem->oid; + table_infos = lappend(table_infos, result); + } 3a. It looks like each element in the table_infos list is a malloced obj of 2x Oids (Oid of table, Oid of pub). IMO better to call this element 'table_info' instead of the meaningless 'result' ~ 3b. Actually, I think it would be better if this function defines a little 2-element structure {Oid relid, Oid pubid} to use instead of this array (which requires knowledge that [0] means relid and [1] means pubid). ~~~ 4. + foreach(lc, table_infos) + { + Oid *table_info_tmp = (Oid *) lfirst(lc); + + if (!list_member_oid(tables, table_info_tmp[0])) + table_infos = foreach_delete_current(table_infos, lc); } I think the '_tmp' suffix is not helpful here - IMO having another relid variable would make this more self-explanatory. Or better yet adopt the suggestion o f #3b and have a little struct with self-explanatory member names. ===== 5. src/backend/commands/subscriptioncmds.c - fetch_table_list + if (server_version >= 160000) + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" Since there is an else statement block, I think this would be more readable if there was a statement block here too. YMMV SUGGESTION if (server_version >= 160000) { appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" ... } ~~~ 6. + /* + * Get the list of tables from publisher, the partition table whose + * ancestor is also in this list will be ignored, otherwise the initial + * data in the partition table would be replicated twice. + */ 6a. "from publisher, the partition" -> "from the publisher. The partition" ~ 6b. This looks like a common comment that also applied to the "if" part, so it seems more appropriate to move it to where I indicated below. Perhaps the whole comment needs a bit of massaging after you move it... + /* + * Get namespace, relname and column list (if supported) of the tables + * belonging to the specified publications. + * + * HERE <<<<<<<<< + * + * From version 16, the parameter of the function pg_get_publication_tables + * can be an array of publications. The partition table whose ancestor is + * also published in this publication array will be filtered out in this + * function. + */ ------ [1] https://www.postgresql.org/message-id/OS0PR01MB5716A30DDEECC59132E1084F94799%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/docs/devel/sql-createpublication.html Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: