Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | CAHut+Ptk6oANo2-mw5191aU0HHP9YKjYhWgTZYU0g0Y1Co_5SQ@mail.gmail.com Whole thread Raw |
In response to | RE: Data is copied twice when specifying both child and parent table in publication ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>) |
Responses |
RE: Data is copied twice when specifying both child and parent table in publication
|
List | pgsql-hackers |
Here are some review comments for patch code of HEAD_v19-0001 ====== doc/src/sgml/ref/create_publication.sgml 1. + <para> + There can be a case where a subscription combines multiple + publications. If a root partitioned table is published by any + subscribed publications which set + <literal>publish_via_partition_root</literal> = true, changes on this + root partitioned table (or on its partitions) will be published using + the identity and schema of this root partitioned table rather than + that of the individual partitions. + </para> 1a. The paragraph prior to this one just refers to "partitioned tables" instead of "root partitioned table", so IMO we should continue with the same terminology. I also modified the remaining text slightly. AFAIK my suggestion conveys exactly the same information but is shorter. SUGGESTION There can be a case where one subscription combines multiple publications. If any of those publications has set <literal>publish_via_partition_root</literal> = true, then changes in a partitioned table (or on its partitions) will be published using the identity and schema of the partitioned table. ~ 1b. Shouldn't that paragraph (or possibly somewhere in the CREATE SUBSCRIPTION notes?) also explain that in this scenario the logical replication will only publish one set of changes? After all, this is the whole point of the patch, but I am not sure if the user will know of this behaviour from the current documentation. ====== src/backend/catalog/pg_publication.c 2. filter_partitions BEFORE: static void filter_partitions(List *table_infos) { ListCell *lc; foreach(lc, table_infos) { bool skip = false; List *ancestors = NIL; ListCell *lc2; published_rel *table_info = (published_rel *) lfirst(lc); if (get_rel_relispartition(table_info->relid)) ancestors = get_partition_ancestors(table_info->relid); foreach(lc2, ancestors) { Oid ancestor = lfirst_oid(lc2); /* Is ancestor exists in the published table list? */ if (is_ancestor_member_tableinfos(ancestor, table_infos)) { skip = true; break; } } if (skip) table_infos = foreach_delete_current(table_infos, lc); } } ~ 2a. My previous review [1] (see #1) suggested putting most code within the condition. AFAICT my comment still is applicable but was not yet addressed. 2b. IMO the comment "/* Is ancestor exists in the published table list? */" is unnecessary because it is already clear what is the purpose of the function named "is_ancestor_member_tableinfos". SUGGESTION static void filter_partitions(List *table_infos) { ListCell *lc; foreach(lc, table_infos) { if (get_rel_relispartition(table_info->relid)) { bool skip = false; ListCell *lc2; published_rel *table_info = (published_rel *) lfirst(lc); List *ancestors = get_partition_ancestors(table_info->relid); foreach(lc2, ancestors) { Oid ancestor = lfirst_oid(lc2); if (is_ancestor_member_tableinfos(ancestor, table_infos)) { skip = true; break; } } if (skip) table_infos = foreach_delete_current(table_infos, lc); } } } ====== src/backend/commands/subscriptioncmds.c 3. fetch_table_list(WalReceiverConn *wrconn, List *publications) { WalRcvExecResult *res; - StringInfoData cmd; + StringInfoData cmd, + pub_names; TupleTableSlot *slot; Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID}; List *tablelist = NIL; - bool check_columnlist = (walrcv_server_version(wrconn) >= 150000); + int server_version = walrcv_server_version(wrconn); + bool check_columnlist = (server_version >= 150000); + + initStringInfo(&pub_names); + get_publications_str(publications, &pub_names, true); initStringInfo(&cmd); - appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n"); - /* Get column lists for each relation if the publisher supports it */ - if (check_columnlist) - appendStringInfoString(&cmd, ", t.attnames\n"); + /* Get the list of tables from the publisher. */ + if (server_version >= 160000) + { ~ I think the 'pub_names' is only needed within that ">= 160000" condition. So all the below code can be moved into that scope can't it? + StringInfoData pub_names; + initStringInfo(&pub_names); + get_publications_str(publications, &pub_names, true); + pfree(pub_names.data); ------ [1] https://www.postgresql.org/message-id/CAHut%2BPuNsvO9o9XzeJuSLsAsndgCKVphDPBRqYuOTy2bR28E%2Bg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: