Re: Skipping schema changes in publication - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: Skipping schema changes in publication |
Date | |
Msg-id | CANhcyEW2LK4diNeCG862DE40yQoV3VAgf59kXUq2TuR8fnw5vQ@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping schema changes in publication (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Mon, 30 Jun 2025 at 11:37, Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shlok. > > Some review comments for v15-0003. > > ====== > doc/src/sgml/catalogs.sgml > > 1. > <para> > - True if the relation must be excluded > + True if the column list or relation must be excluded from publication. > + If a column list is specified in <literal>prattrs</literal>, then > + exclude only those columns. If <literal>prattrs</literal> is NULL, > + then exclude the entire relation. > </para></entry> > > I noticed other fields on this page say "null" instead of "NULL". It > seems like "null" is more conventional. > Fixed > ====== > doc/src/sgml/logical-replication.sgml > > 2. > <para> > If no column list is specified, any columns added to the table later are > automatically replicated. This means that having a column list which names > - all columns is not the same as having no column list at all. > + all columns is not the same as having no column list at all. > Similarly, if an > + column list is specified with EXCEPT, any columns added to the table later > + are also replicated automatically. > </para> > > 2a. > CURRENTLY > If no column list or a column list with EXCEPT is specified, any > columns added to the table later are automatically replicated. This > means that having a column list which names all columns is not the > same as having no column list at all. If an column list is specified, > any columns added to the table later are automatically replicated. > > ~ > > That still doesn't quite make sense. I think instead of saying "This > means..." it needs to say something a bit like below: > > However, a normal column list (without EXCEPT) only > specified columns and no more. Therefore, having a column list that > names all columns is not the same as having no column list at all, as > more columns may be added to the table later. > Fixed > ~ > > 2b. > And the final sentence "If an column list..." looks like a cut/paste error (??) > Yes it was a mistake. > ~ > > 2c. > Maybe here EXCEPT should be written as <literal>EXCEPT</literal> > Fixed. > ~~~ > > 2.5A. > The description about generated columns still says this: > > CURRENT: > Generated columns can also be specified in a column list. This allows > generated columns to be published, regardless of the publication > parameter publish_generated_columns. See Section 29.6 for details. > > ~ > > But I don't think it is quite correct. IMO gencols behaviour is much > more subtle... > > e.g. > > a) Normal collist - these named cols are published REGARDLESS of the > 'publish_generated_cols' parameter (same as before) > > b) EXCEPT collist - you can specify gencols in the list REGARDLESS of > the 'publish_generated_cols' parameter, because since they are named > as "except" then they will not be published anyhow.... > > c) BUT for EXCEPT collist case, I think any gencols that are *not* > covered by that EXCEPT collist should follow the rules according to > the 'publish_generated_cols' parameter. > > So, it is much more tricky than the docs currently say: > Modified the documentation > Also > > 2.5B. > - The text says "See Section 29.6 for details," but there are no > examples of these combinations (e.g. EXCEPT collist and diff parameter > setting) > Added documentation. > 2.5C, > - The regression tests also need to be more complex to cover these > Added tests related to these > 2.5D. > - You might need to add something in the CREATE PUBLICATION "NOTES" > section after all -- even if it just refers to here. > Added documentation > ~~~ > > 3. > <para> > Create a publication <literal>p1</literal>. A column list is defined for > - table <literal>t1</literal> to reduce the number of columns that will be > - replicated. Notice that the order of column names in the column list does > - not matter. > + table <literal>t1</literal>, and another column list is defined for table > + <literal>t2</literal> using the EXCEPT clause to reduce the number of > + columns that will be replicated. Note that the order of column names in > + the column lists does not matter. > <programlisting> > -/* pub # */ CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d); > +/* pub # */ CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d), t2 EXCEPT (d, a); > </programlisting></para> > > Maybe here EXCEPT should be written as <literal>EXCEPT</literal> > Fixed > ====== > doc/src/sgml/ref/create_publication.sgml > > 4. > <para> > - When a column list is specified, only the named columns are replicated. > - The column list can contain stored generated columns as well. If the > - column list is omitted, the publication will replicate all non-generated > - columns (including any added in the future) by default. Stored generated > - columns can also be replicated if > <literal>publish_generated_columns</literal> > - is set to <literal>stored</literal>. Specifying a column list has no > - effect on <literal>TRUNCATE</literal> commands. See > + When a column list without EXCEPT is specified, only the named > columns are > + replicated. The column list can contain stored generated columns as well. > + If the column list is omitted, the publication will replicate > + all non-generated columns (including any added in the future) by default. > + Stored generated columns can also be replicated if > + <literal>publish_generated_columns</literal> is set to > + <literal>stored</literal>. Specifying a column list has no effect on > + <literal>TRUNCATE</literal> commands. See > <xref linkend="logical-replication-col-lists"/> for details about column > lists. > </para> > > Maybe here EXCEPT should be written as <literal>EXCEPT</literal> > Fixed > ~~~ > > 5. > + <para> > + When a column list is specified with EXCEPT, the named columns are not > + replicated. Specifying a column list has no effect on > + <literal>TRUNCATE</literal> commands. > + </para> > > Maybe here EXCEPT should be written as <literal>EXCEPT</literal>. > Fixed > ** Note all the extra subtleties that I mentioned in the review > comment #2.5 above --- e.g. IMO any *un-listed* gencols still should > follow the parameter rules. > > ~~~ > > 6. > <para> > Any column list must include the <literal>REPLICA IDENTITY</literal> columns > - in order for <command>UPDATE</command> or <command>DELETE</command> > - operations to be published. There are no column list restrictions if the > - publication publishes only <command>INSERT</command> operations. > + and any column list specified with EXCEPT must not include the > + <literal>REPLICA IDENTITY</literal> columns in order for > + <command>UPDATE</command> or <command>DELETE</command> operations to be > + published. There are no column list restrictions if the > publication publishes > + only <command>INSERT</command> operations. > </para> > > 6a. > CURRENT: > Any column list must include the REPLICA IDENTITY columns, and any > column list specified with EXCEPT must not include the REPLICA > IDENTITY columns in order for UPDATE or DELETE operations to be > published. > > ~ > > I felt that might be better expressed the other way around. Also, it > might be better to say "not name" instead of "not include" because > EXCEPT + include seemed a bit contrary. > > > SUGGESTION (maybe like this) > In order for UPDATE or DELETE operations to work, all the REPLICA > IDENTITY columns must be published. So, any column list must name all > REPLICA IDENTITY columns, and any EXCEPT column list must not name any > REPLICA IDENTITY columns. > Fixed > ~~ > > 6b. > Maybe here EXCEPT should be written as <literal>EXCEPT</literal> > Fixed > ====== > src/backend/catalog/pg_publication.c > > check_and_fetch_column_list: > > 7. > + /* Lookup the except attribute */ > + cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, > + Anum_pg_publication_rel_prexcept, &isnull); > + > + if (!isnull) > + { > + Assert(!pub->alltables); > + *except_columns = DatumGetBool(cfdatum); > + } > + > > I felt it would be safer to also assign *except_columns = false; > up-front so the caller could be sure this flag was meaningful on > return. > Fixed > ~~~ > > pub_form_cols_map: > > 8. > Maybe use snake case like for other params, so /excepcols/except_cols/ > Fixed > ~~~ > > pg_get_publication_tables: > > 9. > > I felt all the logic in this function maybe can be simpler: > > e.g. If you just have "Bitmapset *except_columns = NULL;" then null > nmeans there is no except columns; otherwise there is. This means you > don't need a separate 'bool except_column' variable. > > e.g. Assign the Bitmapset *except_columns after you already have the > values[2], instead of doing it later. > > e.g. The skip code if (except_columns && bms_is_member(att->attnum, > columns)) could just check the list member, I think, without the > additional bool. > > ~~~ > Fixed > 10. > + /* > + * We fetch pubtuple if publication is not FOR ALL TABLES and not > + * FOR TABLES IN SCHEMA. So if prexcept is true, it indicate that > + * prattrs contains columns to be excluded for replication. > + */ > + if (!isnull) > + except_columns = DatumGetBool(exceptDatum); > > > /indicate/indicates/ > Fixed > ====== > src/backend/parser/gram.y > > 11. > + | TABLE relation_expr EXCEPT opt_except_column_list OptWhereClause > + { > + $$ = makeNode(PublicationObjSpec); > + $$->pubobjtype = PUBLICATIONOBJ_TABLE; > + $$->pubtable = makeNode(PublicationTable); > + $$->pubtable->relation = $2; > + $$->pubtable->columns = $4; > + $$->pubtable->whereClause = $5; > + $$->pubtable->except = true; > + $$->location = @1; > + } > > I wasn't expecting you would need another 'opt_except_column_list' and > all the code duplication that causes. AFAIK, the syntax is identical > for 'opt_column_list' apart from the preceding EXCEPT so I thought all > you need is to allow the 'opt_column_list' to have an optional EXCEPT > qualifier. > The main reason I used a separate 'opt_except_column_list' is because 'opt_column_list' can also be NULL. But the column list specified with EXCEPT not be NULL. So, 'opt_except_column_list' is defined such that it cannot be null. > ====== > src/backend/replication/pgoutput/pgoutput.c > > 12. > + > + /* > + * Indicates whether no columns are published for a given relation. With > + * the introduction of the EXCEPT clause in column lists, it is now > + * possible to define a publication that excludes all columns of a table. > + * However, the 'columns' attribute cannot represent this case, since a > + * NULL value implies that all columns are published. To distinguish this > + * scenario, the 'no_cols_published' flag is introduced. > + */ > + bool no_cols_published; > } RelationSyncEntry; > > But, what about when Bitmapset *columns is not null, but has no bits > set -- doesn't that mean the same as "no columns"? > I think this is possible. A bitmapset which has no set bit is NULL. I saw following comment in bitmapset.c "By convention, we always represent a set with * the minimum possible number of words, i.e, there are never any trailing * zero words. Enforcing this requires that an empty set is represented as * NULL. Because an empty Bitmapset is represented as NULL, a non-NULL * Bitmapset always has at least 1 Bitmapword." > ====== > src/include/catalog/pg_publication.h > > 13. > extern Bitmapset *pub_form_cols_map(Relation relation, > - PublishGencolsType include_gencols_type); > + PublishGencolsType include_gencols_type, > + Bitmapset *exceptcols); > > Maybe snake-case like the other params: /exceptcols/except_cols/ > Fixed > ====== > src/test/regress/sql/publication.sql > > 14. > +-- Verify that publication is created with EXCEPT > +CREATE PUBLICATION testpub_except FOR TABLE pub_test_except1, > pub_sch1.pub_test_except2 EXCEPT (b, c); > +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except'; > + > > I think tests should also use psql \dRp+ commands in places to show > that the "describe" stuff is working correctly. > > ~~~ Fixed > > 15. > +-- Check for invalid cases > +CREATE PUBLICATION testpub_except2 FOR TABLES IN SCHEMA pub_sch1, > TABLE pub_test_except1 EXCEPT (b, c); > +CREATE PUBLICATION testpub_except2 FOR TABLE pub_test_except1 EXCEPT; > > Should explain more about what you are testing here: > a) cannot use EXCEPT col-lists combined with TABLES IN SCHEMA > b) syntax error EXCEPT without a col-list > > ~~~ fixed > > 16. > +-- Verify that publication can be altered with EXCEPT > +ALTER PUBLICATION testpub_except SET TABLE pub_test_except1 EXCEPT > (a, b), pub_sch1.pub_test_except2; > +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except'; > > The comment is a bit misleading because there are many kinds of > "alter". Maybe say more like > Verify ok - ALTER PUBLICATION ... SET ... EXCEPT (col-list) > > ~~~ Fixed > > 17. > +-- Verify ALTER PUBLICATION ... DROP > +ALTER PUBLICATION testpub_except DROP TABLE pub_test_except1 EXCEPT (a, b); > +ALTER PUBLICATION testpub_except DROP TABLE pub_test_except1; > > Should explain more: > +-- Verify fails - ALTER PUBLICATION ... DROP ... EXCEPT (col-list) > +-- Verify ok - ALTER PUBLICATION ... DROP ... > > ~~~ Fixed > > 18. > +ALTER PUBLICATION testpub_except ADD TABLE pub_test_except1 EXCEPT (c, d); > +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except'; > > Missing comment: > +-- Verify ok - ALTER PUBLICATION ... ADD ... EXCEPT (col-list) > > ~~~ Fixed > > 19. > +-- Verify excluded columns cannot be part of REPLICA IDENTITY > +ALTER TABLE pub_test_except1 REPLICA IDENTITY FULL; > +UPDATE pub_test_except1 SET a = 3 WHERE a = 1; > > +CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a, c); > +ALTER TABLE pub_test_except1 REPLICA IDENTITY USING INDEX > pub_test_except1_a_idx; > +UPDATE pub_test_except1 SET a = 3 WHERE a = 1; > > +DROP INDEX pub_test_except1_a_idx; > +CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a); > +ALTER TABLE pub_test_except1 REPLICA IDENTITY USING INDEX > pub_test_except1_a_idx; > +UPDATE pub_test_except1 SET a = 3 WHERE a = 1; > + > +DROP INDEX pub_test_except1_a_idx; > > 19a. > IIUC, really there are multiple tests here, so I think it should all > be split and commented separately. > > a) Verify that EXCEPT col-list cannot contain RI cols (when using RI FULL) > b) Verify that EXCEPT col-list cannot contain RI cols (when using INDEX) > c) Verify that so long as no clash between RI cols and the EXCEPT > col-list, then it is ok > > ~ Fixed > > 19b. > IMO, some index names could be better: > > CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a, c); > How about 'pub_test_except1_ac_idx'? > > ~~~ > Fixed > 20. > +DROP PUBLICATION testpub_except; > +DROP TABLE pub_test_except1; > +DROP TABLE pub_sch1.pub_test_except2; > > Add a "cleanup" comment. > Added I have addressed the comments and added the latest v16. Thanks and Regards, Shlok Kyal
Attachment
pgsql-hackers by date: