Re: Skipping schema changes in publication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Skipping schema changes in publication
Date
Msg-id CAHut+PsXP_61ZXuVOx5u9FZGK3oH4taaA59oOzgqyygZx8ezWw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping schema changes in publication  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
List pgsql-hackers
Hi Shlok.

Below are some review comments for v14-0003

======
1. GENERAL

Since the new syntax uses EXCEPT, then, in my opinion, you should try
to use that same term where possible when describing things. I
understand it is hard to do this in text and I agree often it makes
more sense to say "exclude" columns etc, but OTOH in the code there
are lots of places where you could have named vars/params differently:
e.g. 'except_collist' instead of 'exclude_collist' might have been
better.

======
Commit message

2.
Column list specifed with EXCEPT is stored in column "prattrs" in table
"pg_publication_rel" and also column "prexcept" is set to "true", to maintain
the column list that user wants to exclude from the publication.

~

That paragraph could do with some rewording. For example, AFAIK,
"prattrs" is for all column lists -- not just except col-lists, but
the way it is described here sounds different.

Also, /specifed/specified/

======
doc/src/sgml/catalogs.sgml

3. (52.42. pg_publication_rel)

       <para>
-       True if the relation must be excluded
+       True if the relation or column list must be excluded. If publication is
+       created <literal>FOR ALL TABLES</literal> and it is specified as true,
+       the relation should be excluded. Else if it is true the columns in
+       <literal>prattrs</literal> should be excluded from being published.
       </para></entry>

I felt this could be expressed more simply without mentioning anything
about FOR ALL TABLES.

SUGGESTION
True if the column list or relation must be excluded from publication.
If a column list is specified in <literal>prattrs</literal>, then
exclude only those columns. If <literal>prattrs</literal> is NULL,
then exclude the entire relation.

======
doc/src/sgml/logical-replication.sgml

4. (29.5. Column Lists)

   <para>
-   Each publication can optionally specify which columns of each table are
-   replicated to subscribers. The table on the subscriber side must have at
-   least all the columns that are published. If no column list is specified,
-   then all columns on the publisher are replicated.
+   Each publication can optionally specify which columns of each
table should be
+   replicated or excluded from replication. On the subscriber side, the table
+   must include at least all the columns that are published. If no column list
+   is provided, all columns from the publisher are replicated by default.
    See <xref linkend="sql-createpublication"/> for details on the syntax.
   </para>

I felt this patch may have changed too much text. IMO, you only needed
to say "... are replicated or excluded from replication.". The other
changes did not seem necessary.

~~~

5.
   <para>
-   If no column list is specified, any columns added to the table later are
-   automatically replicated. This means that having a column list which names
-   all columns is not the same as having no column list at all.
+   If no column list or a column list with EXCEPT is specified, any columns
+   added to the table later are automatically replicated. This means
that having
+   a column list which names all columns is not the same as having no
+   column list at all. If an column list is specified, any columns added to the
+   table later are automatically replicated.
   </para>

5a.
"This means that having a column list which names all columns is not
the same as having no column list at all." -- That note does not make
sense when you say EXCEPT. I think some rewording is needed here.

~

5b.
"If an column list is specified, any columns added to the table later
are automatically replicated.".

This made no sense -- some words missing?

~~~

6.
    Generated columns can also be specified in a column list. This allows
    generated columns to be published, regardless of the publication parameter
    <link linkend="sql-createpublication-params-with-publish-generated-columns">
-   <literal>publish_generated_columns</literal></link>. See
-   <xref linkend="logical-replication-gencols"/> for details.
+   <literal>publish_generated_columns</literal></link>. Generated columns can
+   be included in column list specified with EXCEPT clause if publication
+   parameter
+   <link linkend="sql-createpublication-params-with-publish-generated-columns">
+   <literal>publish_generated_columns</literal></link> is not set to
+   <literal>none</literal>. Specified generated columns will not be published.
+   See <xref linkend="logical-replication-gencols"/> for details.
   </para>

I am not so sure about this. It seemed overly strict to me.

Why can't it simply say:
"Generated columns can also be specified in a column list. This allows
generated columns to be published or excluded, regardless of the
publication parameter..."

Specifically, I don't know why you need to say:
Generated columns can be included in column list specified with EXCEPT
clause if publication parameter publish_generated_columns is not set
to none. Specified generated columns will not be published.

IIUC, then EXCEPT (gencol1, gencol2) is saying to exclude the named
cols. So if param is "stored", then the named cols will be excluded.
OTOH, if param is "none" then all generated cols will be excluded
anyway, so why not just allow the EXCEPT (gencol,gencol2) here as
well, because the result will be the same.


~~~

7. (29.5.1. Examples)

    <para>
-    Create a table <literal>t1</literal> to be used in the following example.
+    Create tables <literal>t1</literal>, <literal>t2</literal> to be
used in the
+    following example.

/Create tables t1, t2/Create tables t1 and t2/

~~~

8.
    <para>
     Create a publication <literal>p1</literal>. A column list is defined for
-    table <literal>t1</literal> to reduce the number of columns that will be
-    replicated. Notice that the order of column names in the column list does
-    not matter.
+    table <literal>t1</literal> and a column list is defined for table
+    <literal>t2</literal> with EXCEPT clause to reduce the number of
columns that will be
+    replicated. Notice that the order of column names in the column
lists does not matter.

BEFORE
A column list is defined for table t1 and a column list is defined for
table t2...

SUGGESTION (added comma, etc.)
A column list is defined for table t1, and another column list is
defined for table t2...

~~~

9.
The final example still says:
"Only data from the column list of publication p1 is replicated."

That doesn't seem quite appropriate now that you also have an EXCEPT
column list.

SUGGESTION:
Only data specified by the column lists of publication p1 is replicated.

======
doc/src/sgml/ref/create_publication.sgml

10.
+     <para>
+      When a column list is specified with EXCEPT, the named columns are not
+      replicated. Specifying a column list has no effect on
+      <literal>TRUNCATE</literal> commands.
+     </para>

I felt that to be clearer the preceding paragraph should be changed as follows:

/When a column list is specified, only the named columns are
replicated./When a column list without EXCEPT is specified, only the
named columns are replicated./

~~~

11. CREATE PUBLICATION (NOTES section)

11a.
The NOTES talk about replica identity columns -- should you mention EXCEPT here?

~

11b.
The NOTES talk about generated columns -- should you mention EXCEPT here?

======
src/backend/catalog/pg_publication.c

12. check_and_fetch_column_list

+ if (!isnull)
+ except = DatumGetBool(cfdatum);
+
+ *except_columns = except && !pub->alltables;

AFAICT, you can Assert(!pub->alltables) because you already checked
that earlier up front.
So you don't need 'except' var either. Just assign *except_cols up
front and then overwrite it later if true.

SUGGESTION:

*except_cols = false;

if (pub->alltables)
  return false;
...
if (!isnull)
 *except_cols = DatumGetBool(cfdatum);

~~~

13. publication_add_relation

  /* Validate and translate column names into a Bitmapset of attnums. */
- attnums = pub_collist_validate(pri->relation, pri->columns);
+ attnums = pub_collist_validate(pri->relation, pri->columns,
+    pri->except && !pub->alltables,
+    pub->pubgencols_type);


I am wondering why we are even calling a function to validate column
lists if pub->alltables was true. AFAIK, that combination of
column-lists and FOR ALL TABLES is not even possible, so the code
seems strange.

~~~

14. pub_exclude_collist_validate
.
+ /*
+ * Check if column list specified with EXCEPT have any stored
+ * generated column and 'publish_generated_columns' is not set to
+ * 'stored'.
+ */
+ if (except_columns &&
+ TupleDescAttr(tupdesc, attnum - 1)->attgenerated ==
ATTRIBUTE_GENERATED_STORED &&
+ pubgencols_type != PUBLISH_GENCOLS_STORED)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot use stored generated column \"%s\" in publication
column list specified with EXCEPT when \"%s\" set to \"%s\"",
+    colname, "publish_generated_columns", "stored"));

As mentioned in the above DOCS comments, I was having doubts about why
we have this error.

If the parameter says "none", then generated columns will not be
replicated, so why should we care if the user also says
EXCEPT(gencol1,gencol2). Either way, the result will be the same; the
generated column will not be published.

~~~

15. GetRelationPublications

  {
  HeapTuple tup = &pubrellist->members[i]->tuple;
  Oid pubid = ((Form_pg_publication_rel) GETSTRUCT(tup))->prpubid;
+ HeapTuple pubtup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
+ bool is_table_excluded = ((Form_pg_publication)
GETSTRUCT(pubtup))->puballtables &&
+ ((Form_pg_publication_rel) GETSTRUCT(tup))->prexcept;

- if (except_flag == ((Form_pg_publication_rel) GETSTRUCT(tup))->prexcept)
+ if (except_flag == is_table_excluded)
  result = lappend_oid(result, pubid);
+
+ ReleaseS


I'm not 100% sure you need the additional 'pubtup'... Can't you just
look at the "prattrs" field to see if a column-list was specified? If
"prattrs" is null and "prexcept" is true, isn't that the same
combination as what you are looking for here?

~~~

16. pg_get_publication_tables

+ columnsDatum = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+    Anum_pg_publication_rel_prattrs,
+    &(nulls[2]));
+
+ /* if column list is specified with EXCEPT */
+ if (!pub->alltables && except)
+ columns = pub_collist_to_bitmapset(NULL, columnsDatum, NULL);
+ else
+ values[2] = columnsDatum;

16a.
Something seems fishy here. Isn't there a pathway where you missed
assigning value[2] to anything?

~

16b.
Also, I feel there should be some other boolean variable used here
instead of checking bot (!pub->alltables && except) in multiple
places.


======
src/backend/replication/pgoutput/pgoutput.c

17. RelationSyncEntry
+
+ /* Indicate if no column is included in the publication */
+ bool no_cols_published;

Maybe this can have a more explanatory comment to explain why it is needed?

~~~

18. check_and_init_gencol

+ bool found = false;
+ bool except_columns = false;
+
+ found = check_and_fetch_column_list(pub, entry->publish_as_relid, NULL,
+ NULL, &except_columns);
+
  /*
  * The column list takes precedence over the
  * 'publish_generated_columns' parameter. Those will be checked later,
- * see pgoutput_column_list_init.
+ * see pgoutput_column_list_init. But when a column list is specified
+ * with EXCEPT, it should be checked.
  */
- if (check_and_fetch_column_list(pub, entry->publish_as_relid, NULL, NULL))
+ if (found && !except_columns)
  continue;

The variable 'found' seems a poor name; how about 'has_column_list' or similar?

~~~

19. pgoutput_change

+ /*
+ * If all columns of a table is present in column list specified with
+ * EXCEPT, skip publishing the changes.
+ */
+ if (relentry->no_cols_published)
+ return;

/is present/are present/

======
src/bin/pg_dump/pg_dump.c

20. getPublicationTables

+ if (strcmp(prexcept, "t") == 0 && PQgetisnull(res, i, i_prattrs))
  pubrinfo[j].dobj.objType = DO_PUBLICATION_EXCEPT_REL;
+ else
+ pubrinfo[j].dobj.objType = DO_PUBLICATION_REL;

  pubrinfo[j].dobj.catId.tableoid =
  atooid(PQgetvalue(res, i, i_tableoid));
@@ -4797,6 +4797,7 @@ getPublicationTables(Archive *fout, TableInfo
tblinfo[], int numTables)
  pubrinfo[j].pubrelqual = NULL;
  else
  pubrinfo[j].pubrelqual = pg_strdup(PQgetvalue(res, i, i_prrelqual));
+ pubrinfo[j].pubexcept = (strcmp(prexcept, "t") == 0);


Why not assign pubrinfo[j].pubexcept earlier so you don't have to
repeat the strcmp?

~~~

21.
- if (strcmp(prexcept, "t") == 0)
+ if (strcmp(prexcept, "t") == 0 && PQgetisnull(res, i, i_prattrs))
  simple_ptr_list_append(&exceptinfo, &pubrinfo[j]);

Why not assign pubrinfo[j].pubexcept earlier so you don't have to
repeat the strcmp? Same also for the PQgetisnull(res, i,
i_prattrs))...

~~~

22. dumpPublicationTable

  if (pubrinfo->pubrattrs)
- appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs);
+ {
+ if (pubrinfo->pubexcept)
+ appendPQExpBuffer(query, " EXCEPT (%s)", pubrinfo->pubrattrs);
+ else
+ appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs);
+ }

SUGGESTION
{
  if (pubrinfo->pubexcept)
    appendPQExpBuffer(query, " EXCEPT");

  appendPQExpBuffer(query, " (%s)", pubrinfo->pubrattrs);
}

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Conflict detection for update_deleted in logical replication
Next
From: Richard Guo
Date:
Subject: Re: Virtual generated columns