Re: Skipping schema changes in publication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Skipping schema changes in publication
Date
Msg-id CAHut+Pu5Ukuvd_5oK_mTEjSOEAYupbEXzRqXwSPoLe_sDkA_fg@mail.gmail.com
Whole thread Raw
In response to Re: Skipping schema changes in publication  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
List pgsql-hackers
Hi Shlok -

Here are some review comments for v31-0001 (EXCEPT (tablelist))

======
Commit message

1.
The new syntax allows specifying excluded relations when creating or altering
a publication. For example:
CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE (t1,t2);

~

In v30, you removed all the ALTER PUBLICATION changes, so the "or
altering" in the message above also needs to be removed.

======
doc/src/sgml/logical-replication.sgml

2.
   <para>
-   To add tables to a publication, the user must have ownership rights on the
-   table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables, all tables in
-   schema, or all sequences automatically, the user must be a superuser.
+   To create a publication using <literal>FOR ALL TABLES</literal>,
+   <literal>FOR ALL SEQUENCES</literal> or
+   <literal>FOR TABLES IN SCHEMA</literal>, the user must be a
superuser. To add
+   <literal>ALL TABLES</literal> or <literal>TABLES IN SCHEMA</literal> to a
+   publication, the user must be a superuser. To add tables to a publication,
+   the user must have ownership rights on the table.
   </para>

This is a good improvement, but I was not sure why it is in this
patch. Should it be a separate thread for a docs improvement?

======
src/backend/catalog/pg_publication.c

GetTopMostAncestorInPublication:

3.
{
  Oid ancestor = lfirst_oid(lc);
- List    *apubids = GetRelationPublications(ancestor);
- List    *aschemaPubids = NIL;
+ List    *apubids = NIL;
+ List    *aexceptpubids = NIL;
+ List    *aschemapubids = NIL;
+ bool set_top = false;
+
+ GetRelationPublications(ancestor, &apubids, &aexceptpubids);

  level++;

- if (list_member_oid(apubids, puboid))
+ /* check if member of table publications */
+ set_top = list_member_oid(apubids, puboid);
+ if (!set_top)
  {
- topmost_relid = ancestor;
+ aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor));

- if (ancestor_level)
- *ancestor_level = level;
+ /* check if member of schema publications */
+ set_top = list_member_oid(aschemapubids, puboid);
+
+ /*
+ * If the publication is all tables publication and the table is
+ * not part of exception tables.
+ */
+ if (!set_top && puballtables)
+ set_top = !list_member_oid(aexceptpubids, puboid);
  }
- else
+
+ if (set_top)
  {
- aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
- if (list_member_oid(aschemaPubids, puboid))
- {
- topmost_relid = ancestor;
+ topmost_relid = ancestor;

- if (ancestor_level)
- *ancestor_level = level;
- }
+ if (ancestor_level)
+ *ancestor_level = level;
  }

  list_free(apubids);
- list_free(aschemaPubids);
+ list_free(aschemapubids);
+ list_free(aexceptpubids);
  }

That 'aschemapubids' can be declared and freed within the if block.

~~~

publication_add_relation:

4.
+ /*
+ * Check when a partition is excluded via EXCEPT TABLE while the
+ * publication has publish_via_partition_root = true.
+ */
+ if (pub->alltables && pub->pubviaroot && pri->except &&
+ targetrel->rd_rel->relispartition)
+ ereport(WARNING,


This comment doesn't sound quite right:

SUGGESTION
Handle the case where a partition is excluded by EXCEPT TABLE while
publish_via_partition_root = true.

~~~

5.
+ /*
+ * Check when a partitioned table is excluded via EXCEPT TABLE while the
+ * publication has publish_via_partition_root = false.
+ */
+ if (pub->alltables && !pub->pubviaroot && pri->except &&
+ targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ ereport(WARNING,

Ditto. Reword like suggested in the previous review comment.

~~~

6.
+/*
+ * Get the list of publication oids associated with a specified relation.
+ * pubids is filled with the list of publication oids the relation is part of.
+ * except_pubids is filled with the list of publication oids the relation is
+ * excluded from.
+ *
+ * This function returns true if the relation is part of any publication.
+ */

Maybe putting 'pubids' and 'except_pubids' in single quotes will help
readability of this comment?

Also, these are already Lists, so they are not filled with lists.

SUGGESTION
Parameter 'pubids' returns the OIDs of the publications the relation is part of.
Parameter 'except_pubids' returns the OIDs of publications the
relation is excluded from.

~~~

GetPublicationRelations:

7.
 /*
- * Gets list of relation oids for a publication.
+ * Return the list of relation OIDs for a publication.
+ *
+ * For a FOR ALL TABLES publication, this returns the list of tables that were
+ * explicitly excluded via an EXCEPT TABLE clause.
+ *
+ * For a FOR TABLE publication, this returns the list of tables explicitly
+ * included in the publication.
  *
- * This should only be used FOR TABLE publications, the FOR ALL
TABLES/SEQUENCES
- * should use GetAllPublicationRelations().
+ * Publications declared with FOR ALL TABLES or FOR ALL SEQUENCES should use
+ * GetAllPublicationRelations() to obtain the complete set of tables covered by
+ * the publication.
  */
 List *
 GetPublicationRelations(Oid pubid, PublicationPartOpt pub_partopt)

7a.
The function is called 'GetPublicationRelations', so it seems
unintuitive that it sometimes returns the list of all the tables that
are *excluded* from the publication. If you are going to have one
single function that does everything, then IMO it might be better to
hide that behind some wrapper functions like:
GetPublicationMemberRelations
GetPublicationExcludedRelations

Consider also that all these assumptions might be OK today but they
won't be OK in the future. e.g. One day, when named FOR SEQUENCE
sq1,sq2 are supported then you will be alble to write a command like
FOR ALL TABLES EXCEPT (t1), FOR SEQUENCE sq1,sq2. That's going to be a
muddle of some included and some excluded relations. So, it is better
to cater for that scenario now, rather than have to rewrite all of
this function again in the future. e.g. Maybe instead of this function
returning one list it is better to return included/excluded Lists or
relations as output parameters?

~

7b.
Also, comments like "Publications declared with FOR ALL TABLES or FOR
ALL SEQUENCES should use..." seems like too many assumptions are being
made. It would be better to enforce the calling requirements using
parameter checking and Asserts instead instead of hoping that callers
are going to abide by the comments.

~~~

GetAllPublicationRelations:

8.
+ exceptlist = GetPublicationRelations(pubid, pubviaroot ?
+ PUBLICATION_PART_ALL :
+ PUBLICATION_PART_ROOT);

This is similar to the above review comment. I'm not sure how you can
just assume that this must be the "except list" -- AFAICT this assumes
that 'GetAllPublicationRelations' can only be called by FOR ALL TABLES
(???). Seems like a lot of assumptions, that would be much better to
be enforced by Asserts in the code.

======
src/backend/commands/publicationcmds.c

pub_rf_contains_invalid_column:

9.
 bool
 pub_rf_contains_invalid_column(Oid pubid, Relation relation, List *ancestors,
-    bool pubviaroot)
+    bool pubviaroot, bool puballtables)

I felt that 'puballtables' is more "important" than 'pubviaroot' so
maybe it should come earlier in the parameter list. (e.g. make it more
similar to 'pub_contains_invalid_column')

======
src/backend/parser/gram.y

10.
+ pub_except_obj_list opt_except_clause

I felt that 'opt_except_clause' should better be called
'opt_pub_except_clause' or 'pub_opt_except_clause' because without
'pub' it is a bit vague.

~~~

11.
+/*
+ * ALL TABLES EXCEPT ( table_name [, ...] ) specification
+ */

11a
This comment should be up where all the other CREATE PUBLICATION
syntax is commented.

~

11b.
Also, there is a missing optional "[TABLE]" part.

~~~

12.
+pub_except_obj_list: PublicationExceptObjSpec
+ { $$ = list_make1($1); }
+ | pub_except_obj_list ',' PublicationExceptObjSpec
+ { $$ = lappend($1, $3); }
+ ;
+
+opt_except_clause:
+ EXCEPT opt_table '(' pub_except_obj_list ')' { $$ = $4; }
+ | /*EMPTY*/ { $$ = NIL; }
+ ;

I felt the clause should be defined before the obj list because that
seems the natural order to read these.

======
src/bin/pg_dump/pg_dump.c

13.
+static SimplePtrList exceptinfo = {NULL, NULL};

Having this as global seems a bit hacky. It has nothing in common with
all the other nearby lists, which are commented as being based on
"patterns given by command-line switches"

~~~

dumpPublication:

14.
+ /* Include exception tables if the publication has EXCEPT TABLEs */
+ for (SimplePtrListCell *cell = exceptinfo.head; cell; cell = cell->next)
+ {
+ PublicationRelInfo *pubrinfo = (PublicationRelInfo *) cell->ptr;
+ TableInfo  *tbinfo;
+
+ if (pubinfo == pubrinfo->publication)
+ {
+ tbinfo = pubrinfo->pubtable;

That 'tbinfo' can be declared within the "if".

~~~

15.
+ appendPQExpBuffer(query, "ONLY %s", fmtQualifiedDumpable(tbinfo));

ONLY is not the default. How did you decide that "ONLY" is the correct
thing to do here?

~~~

getPublicationTables:

16.
- pubrinfo[j].dobj.objType = DO_PUBLICATION_REL;
+ if (prexcept)
+ pubrinfo[j].dobj.objType = DO_PUBLICATION_EXCEPT_REL;
+ else
+ pubrinfo[j].dobj.objType = DO_PUBLICATION_REL;
+

Would a single assignment (ternary) make this code simpler and easier to read?

SUGGESTION
pubrinfo[j].dobj.objType = prexcept ?
  DO_PUBLICATION_EXCEPT_REL :
  DO_PUBLICATION_REL;

======
src/bin/pg_dump/t/002_pg_dump.pl

17.
+ 'CREATE PUBLICATION pub10' => {
+ create_order => 50,
+ create_sql =>
+   'CREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE
(dump_test.test_table_generated);',
+ regexp => qr/^
+ \QCREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE (ONLY
dump_test.test_table_generated, ONLY
dump_test.test_table_generated_child2, ONLY
dump_test.test_table_generated_child1) WITH (publish = 'insert,
update, delete, truncate');\E
+ /xm,
+ like => { %full_runs, section_post_data => 1, },
+ },
+

These "generated" names seem unusual. I saw there are some other
tables like 'dump_test.test_inheritance_child' and
'dump_test.test_inheritance_parent'. Can you use those more normal
table names instead?

Also curious - does the order of the tests matter? I saw that the
CREATE TABLE tests seem to be coming after the CREATE PUBLICATION
tests that are using them.

~~~
18.
- if (!defined($tests{$test}->{all_runs})
+ if (   !defined($tests{$test}->{all_runs})

Why add this whitespace?

======
src/include/nodes/parsenodes.h

19.
  AP_SetObjects, /* set list of objects */
+ AP_Reset, /* reset the publication */
 } AlterPublicationAction;

AFAIK, you removed all ALTER command changes from v30-0001. So this
should not be here.

~~~

20.
+ bool for_all_tables; /* Special publication for all tables in db */
  AlterPublicationAction action; /* What action to perform with the given
  * objects */
 } AlterPublicationStmt;

AFAIK, you removed all ALTER command changes from v30-0001. So this
should not be here.

======
src/test/regress/sql/publication.sql

21.
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION testpub_foralltables_excepttable FOR ALL TABLES
EXCEPT TABLE (testpub_tbl1, testpub_tbl2);
+-- specify EXCEPT without TABLE
+CREATE PUBLICATION testpub_foralltables_excepttable1 FOR ALL TABLES
EXCEPT (testpub_tbl1);

Should be 2 comments here for the 2x CREATE:

# Exclude tables using FOR ALL TABLES EXCEPT TABLE (tablelist)

# Exclude tables using FOR ALL TABLES EXCEPT (tablelist)

~~~

22.
 CREATE TABLE testpub_tbl3 (a int);
 CREATE TABLE testpub_tbl3a (b text) INHERITS (testpub_tbl3);

If you rename these tables like 'testpub_tbl_parent' and
'testpub_tbl_child', it will be much easier to see what is going on.

~~~

23.
+CREATE PUBLICATION testpub5 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3);

Missing comment -- something like:
# Exclude parent table, omitting both of 'ONLY' and '*'

~~~

24.
+-- EXCEPT with wildcard: exclude table and all descendants
+CREATE PUBLICATION testpub6 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3*);

24a.
TBH, I don't think this is a "wildcard" -- it is not doing any pattern
matching. IMO just call it an "asterisk" or a "star".

~

24b.
And put a space before the '*' here.

======
.../t/037_rep_changes_except_table.pl

25.
+# ============================================
+# EXCEPT TABLE test cases for partition tables
+# ============================================
+# Check behavior of EXCEPT TABLE together with publish_via_partition_root
+# when applied to a partitioned table and its partitions.


Really, that "Check behavior" sentence is generic for all of the
following tests, so it should also be (within the "=======" of the
previous comment)

~~~

26.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE sch1.t1(a int) PARTITION BY RANGE(a);
+ CREATE TABLE sch1.part1 PARTITION OF sch1.t1 FOR VALUES FROM (0) TO (5);
+ CREATE TABLE sch1.part2 PARTITION OF sch1.t1 FOR VALUES FROM (6) TO (10);
+ INSERT INTO sch1.t1 VALUES (1), (6);
+));
+
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE sch1.t1(a int);
+ CREATE TABLE sch1.part1(a int);
+ CREATE TABLE sch1.part2(a int);
+));

26a.
There should be a comment for this part that just says something like
"Setup partition table and partitions on the publisher that map to
normal tables on the subscriber"

~

26b.
The INSERT should be done later, after the CREATE PUBLICATION but
before the CREATE SUBSCRIPTION. The pattern will be the same for all
the test cases.

~~~

27.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.part1)"
+);

Even though the publish_via_partition_root is 'false' by default, I
think you should spell it out explicitly here for clarity.

~~~

28.
+# EXCEPT TABLE (sch1.t1) with publish_via_partition_root = false
+# Excluding the partitioned table while publish_via_partition_root = false
+# still allows rows inserted into its partitions to be replicated.

I felt you should word this differently. I don't think you should say
"inserted into its partitions" because actually, you inserted into the
partition table, and the data just ends up in the partitions.

~~~

29.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.t1);
+ INSERT INTO sch1.t1 VALUES (1), (6);
+));

Ditto earlier comment. Better to explicitly say
"publish_via_partition_root=false".

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: backpatch tests: Rename conflicting role names to 14/15
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching