Re: Skipping schema changes in publication - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Skipping schema changes in publication |
Date | |
Msg-id | CAHut+PsXP_61ZXuVOx5u9FZGK3oH4taaA59oOzgqyygZx8ezWw@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping schema changes in publication (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
List | pgsql-hackers |
Hi Shlok. Below are some review comments for v14-0003 ====== 1. GENERAL Since the new syntax uses EXCEPT, then, in my opinion, you should try to use that same term where possible when describing things. I understand it is hard to do this in text and I agree often it makes more sense to say "exclude" columns etc, but OTOH in the code there are lots of places where you could have named vars/params differently: e.g. 'except_collist' instead of 'exclude_collist' might have been better. ====== Commit message 2. Column list specifed with EXCEPT is stored in column "prattrs" in table "pg_publication_rel" and also column "prexcept" is set to "true", to maintain the column list that user wants to exclude from the publication. ~ That paragraph could do with some rewording. For example, AFAIK, "prattrs" is for all column lists -- not just except col-lists, but the way it is described here sounds different. Also, /specifed/specified/ ====== doc/src/sgml/catalogs.sgml 3. (52.42. pg_publication_rel) <para> - True if the relation must be excluded + True if the relation or column list must be excluded. If publication is + created <literal>FOR ALL TABLES</literal> and it is specified as true, + the relation should be excluded. Else if it is true the columns in + <literal>prattrs</literal> should be excluded from being published. </para></entry> I felt this could be expressed more simply without mentioning anything about FOR ALL TABLES. SUGGESTION True if the column list or relation must be excluded from publication. If a column list is specified in <literal>prattrs</literal>, then exclude only those columns. If <literal>prattrs</literal> is NULL, then exclude the entire relation. ====== doc/src/sgml/logical-replication.sgml 4. (29.5. Column Lists) <para> - Each publication can optionally specify which columns of each table are - replicated to subscribers. The table on the subscriber side must have at - least all the columns that are published. If no column list is specified, - then all columns on the publisher are replicated. + Each publication can optionally specify which columns of each table should be + replicated or excluded from replication. On the subscriber side, the table + must include at least all the columns that are published. If no column list + is provided, all columns from the publisher are replicated by default. See <xref linkend="sql-createpublication"/> for details on the syntax. </para> I felt this patch may have changed too much text. IMO, you only needed to say "... are replicated or excluded from replication.". The other changes did not seem necessary. ~~~ 5. <para> - If no column list is specified, any columns added to the table later are - automatically replicated. This means that having a column list which names - all columns is not the same as having no column list at all. + If no column list or a column list with EXCEPT is specified, any columns + added to the table later are automatically replicated. This means that having + a column list which names all columns is not the same as having no + column list at all. If an column list is specified, any columns added to the + table later are automatically replicated. </para> 5a. "This means that having a column list which names all columns is not the same as having no column list at all." -- That note does not make sense when you say EXCEPT. I think some rewording is needed here. ~ 5b. "If an column list is specified, any columns added to the table later are automatically replicated.". This made no sense -- some words missing? ~~~ 6. Generated columns can also be specified in a column list. This allows generated columns to be published, regardless of the publication parameter <link linkend="sql-createpublication-params-with-publish-generated-columns"> - <literal>publish_generated_columns</literal></link>. See - <xref linkend="logical-replication-gencols"/> for details. + <literal>publish_generated_columns</literal></link>. Generated columns can + be included in column list specified with EXCEPT clause if publication + parameter + <link linkend="sql-createpublication-params-with-publish-generated-columns"> + <literal>publish_generated_columns</literal></link> is not set to + <literal>none</literal>. Specified generated columns will not be published. + See <xref linkend="logical-replication-gencols"/> for details. </para> I am not so sure about this. It seemed overly strict to me. Why can't it simply say: "Generated columns can also be specified in a column list. This allows generated columns to be published or excluded, regardless of the publication parameter..." Specifically, I don't know why you need to say: Generated columns can be included in column list specified with EXCEPT clause if publication parameter publish_generated_columns is not set to none. Specified generated columns will not be published. IIUC, then EXCEPT (gencol1, gencol2) is saying to exclude the named cols. So if param is "stored", then the named cols will be excluded. OTOH, if param is "none" then all generated cols will be excluded anyway, so why not just allow the EXCEPT (gencol,gencol2) here as well, because the result will be the same. ~~~ 7. (29.5.1. Examples) <para> - Create a table <literal>t1</literal> to be used in the following example. + Create tables <literal>t1</literal>, <literal>t2</literal> to be used in the + following example. /Create tables t1, t2/Create tables t1 and t2/ ~~~ 8. <para> Create a publication <literal>p1</literal>. A column list is defined for - table <literal>t1</literal> to reduce the number of columns that will be - replicated. Notice that the order of column names in the column list does - not matter. + table <literal>t1</literal> and a column list is defined for table + <literal>t2</literal> with EXCEPT clause to reduce the number of columns that will be + replicated. Notice that the order of column names in the column lists does not matter. BEFORE A column list is defined for table t1 and a column list is defined for table t2... SUGGESTION (added comma, etc.) A column list is defined for table t1, and another column list is defined for table t2... ~~~ 9. The final example still says: "Only data from the column list of publication p1 is replicated." That doesn't seem quite appropriate now that you also have an EXCEPT column list. SUGGESTION: Only data specified by the column lists of publication p1 is replicated. ====== doc/src/sgml/ref/create_publication.sgml 10. + <para> + When a column list is specified with EXCEPT, the named columns are not + replicated. Specifying a column list has no effect on + <literal>TRUNCATE</literal> commands. + </para> I felt that to be clearer the preceding paragraph should be changed as follows: /When a column list is specified, only the named columns are replicated./When a column list without EXCEPT is specified, only the named columns are replicated./ ~~~ 11. CREATE PUBLICATION (NOTES section) 11a. The NOTES talk about replica identity columns -- should you mention EXCEPT here? ~ 11b. The NOTES talk about generated columns -- should you mention EXCEPT here? ====== src/backend/catalog/pg_publication.c 12. check_and_fetch_column_list + if (!isnull) + except = DatumGetBool(cfdatum); + + *except_columns = except && !pub->alltables; AFAICT, you can Assert(!pub->alltables) because you already checked that earlier up front. So you don't need 'except' var either. Just assign *except_cols up front and then overwrite it later if true. SUGGESTION: *except_cols = false; if (pub->alltables) return false; ... if (!isnull) *except_cols = DatumGetBool(cfdatum); ~~~ 13. publication_add_relation /* Validate and translate column names into a Bitmapset of attnums. */ - attnums = pub_collist_validate(pri->relation, pri->columns); + attnums = pub_collist_validate(pri->relation, pri->columns, + pri->except && !pub->alltables, + pub->pubgencols_type); I am wondering why we are even calling a function to validate column lists if pub->alltables was true. AFAIK, that combination of column-lists and FOR ALL TABLES is not even possible, so the code seems strange. ~~~ 14. pub_exclude_collist_validate . + /* + * Check if column list specified with EXCEPT have any stored + * generated column and 'publish_generated_columns' is not set to + * 'stored'. + */ + if (except_columns && + TupleDescAttr(tupdesc, attnum - 1)->attgenerated == ATTRIBUTE_GENERATED_STORED && + pubgencols_type != PUBLISH_GENCOLS_STORED) + ereport(ERROR, + errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot use stored generated column \"%s\" in publication column list specified with EXCEPT when \"%s\" set to \"%s\"", + colname, "publish_generated_columns", "stored")); As mentioned in the above DOCS comments, I was having doubts about why we have this error. If the parameter says "none", then generated columns will not be replicated, so why should we care if the user also says EXCEPT(gencol1,gencol2). Either way, the result will be the same; the generated column will not be published. ~~~ 15. GetRelationPublications { HeapTuple tup = &pubrellist->members[i]->tuple; Oid pubid = ((Form_pg_publication_rel) GETSTRUCT(tup))->prpubid; + HeapTuple pubtup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid)); + bool is_table_excluded = ((Form_pg_publication) GETSTRUCT(pubtup))->puballtables && + ((Form_pg_publication_rel) GETSTRUCT(tup))->prexcept; - if (except_flag == ((Form_pg_publication_rel) GETSTRUCT(tup))->prexcept) + if (except_flag == is_table_excluded) result = lappend_oid(result, pubid); + + ReleaseS I'm not 100% sure you need the additional 'pubtup'... Can't you just look at the "prattrs" field to see if a column-list was specified? If "prattrs" is null and "prexcept" is true, isn't that the same combination as what you are looking for here? ~~~ 16. pg_get_publication_tables + columnsDatum = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple, + Anum_pg_publication_rel_prattrs, + &(nulls[2])); + + /* if column list is specified with EXCEPT */ + if (!pub->alltables && except) + columns = pub_collist_to_bitmapset(NULL, columnsDatum, NULL); + else + values[2] = columnsDatum; 16a. Something seems fishy here. Isn't there a pathway where you missed assigning value[2] to anything? ~ 16b. Also, I feel there should be some other boolean variable used here instead of checking bot (!pub->alltables && except) in multiple places. ====== src/backend/replication/pgoutput/pgoutput.c 17. RelationSyncEntry + + /* Indicate if no column is included in the publication */ + bool no_cols_published; Maybe this can have a more explanatory comment to explain why it is needed? ~~~ 18. check_and_init_gencol + bool found = false; + bool except_columns = false; + + found = check_and_fetch_column_list(pub, entry->publish_as_relid, NULL, + NULL, &except_columns); + /* * The column list takes precedence over the * 'publish_generated_columns' parameter. Those will be checked later, - * see pgoutput_column_list_init. + * see pgoutput_column_list_init. But when a column list is specified + * with EXCEPT, it should be checked. */ - if (check_and_fetch_column_list(pub, entry->publish_as_relid, NULL, NULL)) + if (found && !except_columns) continue; The variable 'found' seems a poor name; how about 'has_column_list' or similar? ~~~ 19. pgoutput_change + /* + * If all columns of a table is present in column list specified with + * EXCEPT, skip publishing the changes. + */ + if (relentry->no_cols_published) + return; /is present/are present/ ====== src/bin/pg_dump/pg_dump.c 20. getPublicationTables + if (strcmp(prexcept, "t") == 0 && PQgetisnull(res, i, i_prattrs)) pubrinfo[j].dobj.objType = DO_PUBLICATION_EXCEPT_REL; + else + pubrinfo[j].dobj.objType = DO_PUBLICATION_REL; pubrinfo[j].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid)); @@ -4797,6 +4797,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) pubrinfo[j].pubrelqual = NULL; else pubrinfo[j].pubrelqual = pg_strdup(PQgetvalue(res, i, i_prrelqual)); + pubrinfo[j].pubexcept = (strcmp(prexcept, "t") == 0); Why not assign pubrinfo[j].pubexcept earlier so you don't have to repeat the strcmp? ~~~ 21. - if (strcmp(prexcept, "t") == 0) + if (strcmp(prexcept, "t") == 0 && PQgetisnull(res, i, i_prattrs)) simple_ptr_list_append(&exceptinfo, &pubrinfo[j]); Why not assign pubrinfo[j].pubexcept earlier so you don't have to repeat the strcmp? Same also for the PQgetisnull(res, i, i_prattrs))... ~~~ 22. dumpPublicationTable if (pubrinfo->pubrattrs) - appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs); + { + if (pubrinfo->pubexcept) + appendPQExpBuffer(query, " EXCEPT (%s)", pubrinfo->pubrattrs); + else + appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs); + } SUGGESTION { if (pubrinfo->pubexcept) appendPQExpBuffer(query, " EXCEPT"); appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs); } ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: