Re: Skipping schema changes in publication - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: Skipping schema changes in publication
Date
Msg-id CANhcyEWg2WbEW_fFwk0D3J2KBrUF7th6VrE+gvESgkUKP9VpZg@mail.gmail.com
Whole thread Raw
In response to Re: Skipping schema changes in publication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Thu, 11 Dec 2025 at 04:10, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shlok -
>
> Here are some review comments for v31-0001 (EXCEPT (tablelist))
>
> ======
> Commit message
>
> 1.
> The new syntax allows specifying excluded relations when creating or altering
> a publication. For example:
> CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE (t1,t2);
>
> ~
>
> In v30, you removed all the ALTER PUBLICATION changes, so the "or
> altering" in the message above also needs to be removed.
>
> ======
> doc/src/sgml/logical-replication.sgml
>
> 2.
>    <para>
> -   To add tables to a publication, the user must have ownership rights on the
> -   table. To add all tables in schema to a publication, the user must be a
> -   superuser. To create a publication that publishes all tables, all tables in
> -   schema, or all sequences automatically, the user must be a superuser.
> +   To create a publication using <literal>FOR ALL TABLES</literal>,
> +   <literal>FOR ALL SEQUENCES</literal> or
> +   <literal>FOR TABLES IN SCHEMA</literal>, the user must be a
> superuser. To add
> +   <literal>ALL TABLES</literal> or <literal>TABLES IN SCHEMA</literal> to a
> +   publication, the user must be a superuser. To add tables to a publication,
> +   the user must have ownership rights on the table.
>    </para>
>
> This is a good improvement, but I was not sure why it is in this
> patch. Should it be a separate thread for a docs improvement?
>
> ======
> src/backend/catalog/pg_publication.c
>
> GetTopMostAncestorInPublication:
>
> 3.
> {
>   Oid ancestor = lfirst_oid(lc);
> - List    *apubids = GetRelationPublications(ancestor);
> - List    *aschemaPubids = NIL;
> + List    *apubids = NIL;
> + List    *aexceptpubids = NIL;
> + List    *aschemapubids = NIL;
> + bool set_top = false;
> +
> + GetRelationPublications(ancestor, &apubids, &aexceptpubids);
>
>   level++;
>
> - if (list_member_oid(apubids, puboid))
> + /* check if member of table publications */
> + set_top = list_member_oid(apubids, puboid);
> + if (!set_top)
>   {
> - topmost_relid = ancestor;
> + aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor));
>
> - if (ancestor_level)
> - *ancestor_level = level;
> + /* check if member of schema publications */
> + set_top = list_member_oid(aschemapubids, puboid);
> +
> + /*
> + * If the publication is all tables publication and the table is
> + * not part of exception tables.
> + */
> + if (!set_top && puballtables)
> + set_top = !list_member_oid(aexceptpubids, puboid);
>   }
> - else
> +
> + if (set_top)
>   {
> - aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
> - if (list_member_oid(aschemaPubids, puboid))
> - {
> - topmost_relid = ancestor;
> + topmost_relid = ancestor;
>
> - if (ancestor_level)
> - *ancestor_level = level;
> - }
> + if (ancestor_level)
> + *ancestor_level = level;
>   }
>
>   list_free(apubids);
> - list_free(aschemaPubids);
> + list_free(aschemapubids);
> + list_free(aexceptpubids);
>   }
>
> That 'aschemapubids' can be declared and freed within the if block.
>
> ~~~
>
> publication_add_relation:
>
> 4.
> + /*
> + * Check when a partition is excluded via EXCEPT TABLE while the
> + * publication has publish_via_partition_root = true.
> + */
> + if (pub->alltables && pub->pubviaroot && pri->except &&
> + targetrel->rd_rel->relispartition)
> + ereport(WARNING,
>
>
> This comment doesn't sound quite right:
>
> SUGGESTION
> Handle the case where a partition is excluded by EXCEPT TABLE while
> publish_via_partition_root = true.
>
> ~~~
>
> 5.
> + /*
> + * Check when a partitioned table is excluded via EXCEPT TABLE while the
> + * publication has publish_via_partition_root = false.
> + */
> + if (pub->alltables && !pub->pubviaroot && pri->except &&
> + targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + ereport(WARNING,
>
> Ditto. Reword like suggested in the previous review comment.
>
> ~~~
>
> 6.
> +/*
> + * Get the list of publication oids associated with a specified relation.
> + * pubids is filled with the list of publication oids the relation is part of.
> + * except_pubids is filled with the list of publication oids the relation is
> + * excluded from.
> + *
> + * This function returns true if the relation is part of any publication.
> + */
>
> Maybe putting 'pubids' and 'except_pubids' in single quotes will help
> readability of this comment?
>
> Also, these are already Lists, so they are not filled with lists.
>
> SUGGESTION
> Parameter 'pubids' returns the OIDs of the publications the relation is part of.
> Parameter 'except_pubids' returns the OIDs of publications the
> relation is excluded from.
>
> ~~~
>
> GetPublicationRelations:
>
> 7.
>  /*
> - * Gets list of relation oids for a publication.
> + * Return the list of relation OIDs for a publication.
> + *
> + * For a FOR ALL TABLES publication, this returns the list of tables that were
> + * explicitly excluded via an EXCEPT TABLE clause.
> + *
> + * For a FOR TABLE publication, this returns the list of tables explicitly
> + * included in the publication.
>   *
> - * This should only be used FOR TABLE publications, the FOR ALL
> TABLES/SEQUENCES
> - * should use GetAllPublicationRelations().
> + * Publications declared with FOR ALL TABLES or FOR ALL SEQUENCES should use
> + * GetAllPublicationRelations() to obtain the complete set of tables covered by
> + * the publication.
>   */
>  List *
>  GetPublicationRelations(Oid pubid, PublicationPartOpt pub_partopt)
>
> 7a.
> The function is called 'GetPublicationRelations', so it seems
> unintuitive that it sometimes returns the list of all the tables that
> are *excluded* from the publication. If you are going to have one
> single function that does everything, then IMO it might be better to
> hide that behind some wrapper functions like:
> GetPublicationMemberRelations
> GetPublicationExcludedRelations
>
> Consider also that all these assumptions might be OK today but they
> won't be OK in the future. e.g. One day, when named FOR SEQUENCE
> sq1,sq2 are supported then you will be alble to write a command like
> FOR ALL TABLES EXCEPT (t1), FOR SEQUENCE sq1,sq2. That's going to be a
> muddle of some included and some excluded relations. So, it is better
> to cater for that scenario now, rather than have to rewrite all of
> this function again in the future. e.g. Maybe instead of this function
> returning one list it is better to return included/excluded Lists or
> relations as output parameters?
>
> ~
>
> 7b.
> Also, comments like "Publications declared with FOR ALL TABLES or FOR
> ALL SEQUENCES should use..." seems like too many assumptions are being
> made. It would be better to enforce the calling requirements using
> parameter checking and Asserts instead instead of hoping that callers
> are going to abide by the comments.
>
> ~~~
>
> GetAllPublicationRelations:
>
> 8.
> + exceptlist = GetPublicationRelations(pubid, pubviaroot ?
> + PUBLICATION_PART_ALL :
> + PUBLICATION_PART_ROOT);
>
> This is similar to the above review comment. I'm not sure how you can
> just assume that this must be the "except list" -- AFAICT this assumes
> that 'GetAllPublicationRelations' can only be called by FOR ALL TABLES
> (???). Seems like a lot of assumptions, that would be much better to
> be enforced by Asserts in the code.
>
I agree with comments 7 and 8. I have added two functions
'GetPublicationIncludedRelations' and
'GetPublicationExcludedRelations'. To get Relations which are included
or excluded in a publication.
Both functions will call 'GetPublicationRelationsInternal' function. I
have also reintroduced the 'except_flag' variable

> ======
> src/backend/commands/publicationcmds.c
>
> pub_rf_contains_invalid_column:
>
> 9.
>  bool
>  pub_rf_contains_invalid_column(Oid pubid, Relation relation, List *ancestors,
> -    bool pubviaroot)
> +    bool pubviaroot, bool puballtables)
>
> I felt that 'puballtables' is more "important" than 'pubviaroot' so
> maybe it should come earlier in the parameter list. (e.g. make it more
> similar to 'pub_contains_invalid_column')
>
> ======
> src/backend/parser/gram.y
>
> 10.
> + pub_except_obj_list opt_except_clause
>
> I felt that 'opt_except_clause' should better be called
> 'opt_pub_except_clause' or 'pub_opt_except_clause' because without
> 'pub' it is a bit vague.
>
I agree. I prefer 'opt_pub_except_clause'. By looking at other
variables it better make sense to start the variable name with 'opt_'
as it indicates that it is optional.
Made changes for the same.

> ~~~
>
> 11.
> +/*
> + * ALL TABLES EXCEPT ( table_name [, ...] ) specification
> + */
>
> 11a
> This comment should be up where all the other CREATE PUBLICATION
> syntax is commented.
>
> ~
>
> 11b.
> Also, there is a missing optional "[TABLE]" part.
>
> ~~~
>
> 12.
> +pub_except_obj_list: PublicationExceptObjSpec
> + { $$ = list_make1($1); }
> + | pub_except_obj_list ',' PublicationExceptObjSpec
> + { $$ = lappend($1, $3); }
> + ;
> +
> +opt_except_clause:
> + EXCEPT opt_table '(' pub_except_obj_list ')' { $$ = $4; }
> + | /*EMPTY*/ { $$ = NIL; }
> + ;
>
> I felt the clause should be defined before the obj list because that
> seems the natural order to read these.
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 13.
> +static SimplePtrList exceptinfo = {NULL, NULL};
>
> Having this as global seems a bit hacky. It has nothing in common with
> all the other nearby lists, which are commented as being based on
> "patterns given by command-line switches"
>
I agree, I have added it in the PublicationInfo struct and made the
corresponding code changes.

> ~~~
>
> dumpPublication:
>
> 14.
> + /* Include exception tables if the publication has EXCEPT TABLEs */
> + for (SimplePtrListCell *cell = exceptinfo.head; cell; cell = cell->next)
> + {
> + PublicationRelInfo *pubrinfo = (PublicationRelInfo *) cell->ptr;
> + TableInfo  *tbinfo;
> +
> + if (pubinfo == pubrinfo->publication)
> + {
> + tbinfo = pubrinfo->pubtable;
>
> That 'tbinfo' can be declared within the "if".
>
> ~~~
>
> 15.
> + appendPQExpBuffer(query, "ONLY %s", fmtQualifiedDumpable(tbinfo));
>
> ONLY is not the default. How did you decide that "ONLY" is the correct
> thing to do here?
>
For pg_dump for publication we use "ONLY" by default while specifying the table

For Alter publication we use similar thing:
```
  appendPQExpBuffer(query, "ALTER PUBLICATION %s ADD TABLE ONLY",
            fmtId(pubinfo->dobj.name));
```

Also if we specify a parent table in a publication(without ONLY) all
its child tables are also added to the pg_publication_rel table.
So when we dump such a publication we get something like:
.... EXCEPT TABLE(ONLY parent_table, ONLY child_table)...

> ~~~
>
> getPublicationTables:
>
> 16.
> - pubrinfo[j].dobj.objType = DO_PUBLICATION_REL;
> + if (prexcept)
> + pubrinfo[j].dobj.objType = DO_PUBLICATION_EXCEPT_REL;
> + else
> + pubrinfo[j].dobj.objType = DO_PUBLICATION_REL;
> +
>
> Would a single assignment (ternary) make this code simpler and easier to read?
>
> SUGGESTION
> pubrinfo[j].dobj.objType = prexcept ?
>   DO_PUBLICATION_EXCEPT_REL :
>   DO_PUBLICATION_REL;
>
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
>
> 17.
> + 'CREATE PUBLICATION pub10' => {
> + create_order => 50,
> + create_sql =>
> +   'CREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE
> (dump_test.test_table_generated);',
> + regexp => qr/^
> + \QCREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE (ONLY
> dump_test.test_table_generated, ONLY
> dump_test.test_table_generated_child2, ONLY
> dump_test.test_table_generated_child1) WITH (publish = 'insert,
> update, delete, truncate');\E
> + /xm,
> + like => { %full_runs, section_post_data => 1, },
> + },
> +
>
> These "generated" names seem unusual. I saw there are some other
> tables like 'dump_test.test_inheritance_child' and
> 'dump_test.test_inheritance_parent'. Can you use those more normal
> table names instead?
>
> Also curious - does the order of the tests matter? I saw that the
> CREATE TABLE tests seem to be coming after the CREATE PUBLICATION
> tests that are using them.
>
I looked into it and came to the conclusion that this is controlled
using 'create_order' while specifying the tests.
Tests with a lower create_order value are executed earlier.
So to ensure 'CREATE PUBLICATION' runs correctly we have to make sure
the 'create_order' of these statements is higher than that of the
respective 'CREATE TABLE' statement.

> ~~~
> 18.
> - if (!defined($tests{$test}->{all_runs})
> + if (   !defined($tests{$test}->{all_runs})
>
> Why add this whitespace?
>
pg_perltidy makes this change. I have reverted it.

> ======
> src/include/nodes/parsenodes.h
>
> 19.
>   AP_SetObjects, /* set list of objects */
> + AP_Reset, /* reset the publication */
>  } AlterPublicationAction;
>
> AFAIK, you removed all ALTER command changes from v30-0001. So this
> should not be here.
>
> ~~~
>
> 20.
> + bool for_all_tables; /* Special publication for all tables in db */
>   AlterPublicationAction action; /* What action to perform with the given
>   * objects */
>  } AlterPublicationStmt;
>
> AFAIK, you removed all ALTER command changes from v30-0001. So this
> should not be here.
>
> ======
> src/test/regress/sql/publication.sql
>
> 21.
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION testpub_foralltables_excepttable FOR ALL TABLES
> EXCEPT TABLE (testpub_tbl1, testpub_tbl2);
> +-- specify EXCEPT without TABLE
> +CREATE PUBLICATION testpub_foralltables_excepttable1 FOR ALL TABLES
> EXCEPT (testpub_tbl1);
>
> Should be 2 comments here for the 2x CREATE:
>
> # Exclude tables using FOR ALL TABLES EXCEPT TABLE (tablelist)
>
> # Exclude tables using FOR ALL TABLES EXCEPT (tablelist)
>
> ~~~
>
> 22.
>  CREATE TABLE testpub_tbl3 (a int);
>  CREATE TABLE testpub_tbl3a (b text) INHERITS (testpub_tbl3);
>
> If you rename these tables like 'testpub_tbl_parent' and
> 'testpub_tbl_child', it will be much easier to see what is going on.
>
> ~~~
>
> 23.
> +CREATE PUBLICATION testpub5 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3);
>
> Missing comment -- something like:
> # Exclude parent table, omitting both of 'ONLY' and '*'
>
> ~~~
>
> 24.
> +-- EXCEPT with wildcard: exclude table and all descendants
> +CREATE PUBLICATION testpub6 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3*);
>
> 24a.
> TBH, I don't think this is a "wildcard" -- it is not doing any pattern
> matching. IMO just call it an "asterisk" or a "star".
>
> ~
>
> 24b.
> And put a space before the '*' here.
>
> ======
> .../t/037_rep_changes_except_table.pl
>
> 25.
> +# ============================================
> +# EXCEPT TABLE test cases for partition tables
> +# ============================================
> +# Check behavior of EXCEPT TABLE together with publish_via_partition_root
> +# when applied to a partitioned table and its partitions.
>
>
> Really, that "Check behavior" sentence is generic for all of the
> following tests, so it should also be (within the "=======" of the
> previous comment)
>
> ~~~
>
> 26.
> +$node_publisher->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE sch1.t1(a int) PARTITION BY RANGE(a);
> + CREATE TABLE sch1.part1 PARTITION OF sch1.t1 FOR VALUES FROM (0) TO (5);
> + CREATE TABLE sch1.part2 PARTITION OF sch1.t1 FOR VALUES FROM (6) TO (10);
> + INSERT INTO sch1.t1 VALUES (1), (6);
> +));
> +
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE sch1.t1(a int);
> + CREATE TABLE sch1.part1(a int);
> + CREATE TABLE sch1.part2(a int);
> +));
>
> 26a.
> There should be a comment for this part that just says something like
> "Setup partition table and partitions on the publisher that map to
> normal tables on the subscriber"
>
> ~
>
> 26b.
> The INSERT should be done later, after the CREATE PUBLICATION but
> before the CREATE SUBSCRIPTION. The pattern will be the same for all
> the test cases.
>
> ~~~
>
> 27.
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.part1)"
> +);
>
> Even though the publish_via_partition_root is 'false' by default, I
> think you should spell it out explicitly here for clarity.
>
> ~~~
>
> 28.
> +# EXCEPT TABLE (sch1.t1) with publish_via_partition_root = false
> +# Excluding the partitioned table while publish_via_partition_root = false
> +# still allows rows inserted into its partitions to be replicated.
>
> I felt you should word this differently. I don't think you should say
> "inserted into its partitions" because actually, you inserted into the
> partition table, and the data just ends up in the partitions.
>
> ~~~
>
> 29.
> +$node_publisher->safe_psql(
> + 'postgres', qq(
> + CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.t1);
> + INSERT INTO sch1.t1 VALUES (1), (6);
> +));
>
> Ditto earlier comment. Better to explicitly say
> "publish_via_partition_root=false".
>
I have also addressed the remaining comments and attached the latest patch.

Thanks,
Shlok Kyal

Attachment

pgsql-hackers by date:

Previous
From: Ajit Awekar
Date:
Subject: Re: Periodic authorization expiration checks using GoAway message
Next
From: Shlok Kyal
Date:
Subject: Re: Skipping schema changes in publication