Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 6e0943fc-9cd7-45e9-6c1e-f9279fd36e2b@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Hi Peter,

Thanks for the review and sorry for taking so long.

I've addressed most of the comments in the patch I sent a couple minutes 
ago. More comments in-line:


On 1/28/22 09:39, Peter Smith wrote:
> Here are some review comments for the v17-0001 patch.
> 
> ~~~
> 
> 1. Commit message
> 
> If no column list is specified, all the columns are replicated, as
> previously
> 
> Missing period (.) at the end of that sentence.
> 

I plan to reword that anyway.

> ~~~
> 
> 2. doc/src/sgml/catalogs.sgml
> 
> +      <para>
> +       This is an array of values that indicates which table columns are
> +       part of the publication.  For example a value of <literal>1 3</literal>
> +       would mean that the first and the third table columns are published.
> +       A null value indicates that all attributes are published.
> +      </para></entry>
> 
> Missing comma:
> "For example" --> "For example,"
> 

Fixed.

> Terms:
> The text seems to jump between "columns" and "attributes". Perhaps,
> for consistency, that last sentence should say: "A null value
> indicates that all columns are published."
> 

Yeah, but that's a pre-existing problem. I've modified the parts added 
by the patch to use "columns" though.

> ~~~
> 
> 3. doc/src/sgml/protocol.sgml
> 
>   </variablelist>
> -        Next, the following message part appears for each column
> (except generated columns):
> +        Next, the following message part appears for each column (except
> +        generated columns and other columns that don't appear in the column
> +        filter list, for tables that have one):
>   <variablelist>
> 
> Perhaps that can be expressed more simply, like:
> 
> Next, the following message part appears for each column (except
> generated columns and other columns not present in the optional column
> filter list):
> 

Not sure. I'll think about it.

> ~~~
> 
> 4. doc/src/sgml/ref/alter_publication.sgml
> 
> +ALTER PUBLICATION <replaceable class="parameter">name</replaceable>
> ALTER TABLE <replaceable
> class="parameter">publication_object</replaceable> SET COLUMNS { (
> <replaceable class="parameter">name</replaceable> [, ...] ) | ALL }
> 
> The syntax chart looks strange because there is already a "TABLE" and
> a column_name list within the "publication_object" definition, so do
> ALTER TABLE and publication_object co-exist?
> According to the current documentation it suggests nonsense like below is valid:
> ALTER PUBLICATION mypublication ALTER TABLE TABLE t1 (a,b,c) SET
> COLUMNS (a,b,c);
> 

Yeah, I think that's wrong. I think "publication_object" is wrong in 
this place, so I've used "table_name".

> --
> 
> But more fundamentally, I don't see why any new syntax is even needed at all.
> 
> Instead of:
> ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS
> (user_id, firstname, lastname);
> Why not just:
> ALTER PUBLICATION mypublication ALTER TABLE users (user_id, firstname,
> lastname);
> 

I haven't modified the grammar yet, but I agree SET COLUMNS seems a bit 
unnecessary. It also seems a bit inconsistent with ADD TABLE which 
simply lists the columns right adter the table name.

> Then, if the altered table defines a *different* column list then it
> would be functionally equivalent to whatever your SET COLUMNS is doing
> now. AFAIK this is how the Row-Filter [1] works, so that altering an
> existing table to have a different Row-Filter just overwrites that
> table's filter. IMO the Col-Filter behaviour should work the same as
> that - "SET COLUMNS" is redundant.
> 

I'm sorry, I don't understand what this is saying :-(

> ~~~
> 
> 5. doc/src/sgml/ref/alter_publication.sgml
> 
> -    TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [, ... ]
> +    TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
> class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]
> 
> That extra comma after the "column_name" seems wrong because there is
> one already in "[, ... ]".
> 

Fixed.

> ~~~
> 
> 6. doc/src/sgml/ref/create_publication.sgml
> 
> -    TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [, ... ]
> +    TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
> class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]
> 
> (Same as comment #5).
> That extra comma after the "column_name" seems wrong because there is
> one already in "[, ... ]".
> 

Fixed.

> ~~~
> 
> 7. doc/src/sgml/ref/create_publication.sgml
> 
> +     <para>
> +      When a column list is specified, only the listed columns are replicated;
> +      any other columns are ignored for the purpose of replication through
> +      this publication.  If no column list is specified, all columns of the
> +      table are replicated through this publication, including any columns
> +      added later.  If a column list is specified, it must include the replica
> +      identity columns.
> +     </para>
> 
> Suggest to re-word this a bit simpler:
> 
> e.g.
> - "listed columns" --> "named columns"
> - I don't think it is necessary to say the unlisted columns are ignored.
> - I didn't think it is necessary to say "though this publication"
> 
> AFTER
> When a column list is specified, only the named columns are replicated.
> If no column list is specified, all columns of the table are replicated,
> including any columns added later. If a column list is specified, it must
> include the replica identity columns.
> 

Fixed, seems reasonable.

> ~~~
> 
> 8. doc/src/sgml/ref/create_publication.sgml
> 
> Consider adding another example showing a CREATE PUBLICATION which has
> a column list.
> 

Added.

> ~~~
> 
> 9. src/backend/catalog/pg_publication.c - check_publication_add_relation
> 
>   /*
> - * Check if relation can be in given publication and throws appropriate
> - * error if not.
> + * Check if relation can be in given publication and that the column
> + * filter is sensible, and throws appropriate error if not.
> + *
> + * targetcols is the bitmapset of attribute numbers given in the column list,
> + * or NULL if it was not specified.
>    */
> 
> Typo: "targetcols" --> "columns" ??
> 

Right, I noticed that too.

> ~~~
> 
> 10. src/backend/catalog/pg_publication.c - check_publication_add_relation
> 
> +
> + /* Make sure the column list checks out */
> + if (columns != NULL)
> + {
> 
> Perhaps "checks out" could be worded better.
> 

Right, I expanded that in my review.

> ~~~
> 
> 11. src/backend/catalog/pg_publication.c - check_publication_add_relation
> 
> + /* Make sure the column list checks out */
> + if (columns != NULL)
> + {
> + /*
> + * Even if the user listed all columns in the column list, we cannot
> + * allow a column list to be specified when REPLICA IDENTITY is FULL;
> + * that would cause problems if a new column is added later, because
> + * the new column would have to be included (because of being part of
> + * the replica identity) but it's technically not allowed (because of
> + * not being in the publication's column list yet).  So reject this
> + * case altogether.
> + */
> + if (replidentfull)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("invalid column list for publishing relation \"%s\"",
> +    RelationGetRelationName(targetrel)),
> + errdetail("Cannot specify a column list on relations with REPLICA
> IDENTITY FULL."));
> +
> + check_publication_columns(pub, targetrel, columns);
> + }
> 
> IIUC almost all of the above comment and code is redundant because by
> calling the check_publication_columns function it will do exactly the
> same check...
> 
> So,  that entire slab might be replaced by 2 lines:
> 
> if (columns != NULL)
> check_publication_columns(pub, targetrel, columns);
> 

You're right. But I think we can make that even simpler by moving even 
the (columns!=NULL) check into the function.

> ~~~
> 
> 12. src/backend/catalog/pg_publication.c - publication_set_table_columns
> 
> +publication_set_table_columns(Relation pubrel, HeapTuple pubreltup,
> +   Relation targetrel, List *columns)
> +{
> + Bitmapset  *attset;
> + AttrNumber *attarray;
> + HeapTuple copytup;
> + int natts;
> + bool nulls[Natts_pg_publication_rel];
> + bool replaces[Natts_pg_publication_rel];
> + Datum values[Natts_pg_publication_rel];
> +
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));
> + memset(replaces, false, sizeof(replaces));
> 
> It seemed curious to use memset false for "replaces" but memset 0 for
> "nulls", since they are both bool arrays (??)
> 

Fixed.

> ~~~
> 
> 13. src/backend/catalog/pg_publication.c - compare_int16
> 
> +/* qsort comparator for attnums */
> +static int
> +compare_int16(const void *a, const void *b)
> +{
> + int av = *(const int16 *) a;
> + int bv = *(const int16 *) b;
> +
> + /* this can't overflow if int is wider than int16 */
> + return (av - bv);
> +}
> 
> This comparator seems common with another one already in the PG
> source. Perhaps it would be better for generic comparators (like this
> one) to be in some common code instead of scattered cut/paste copies
> of the same thing.
> 

I thought about it, but it doesn't really seem worth the effort.

> ~~~
> 
> 14. src/backend/commands/publicationcmds.c - AlterPublicationTables
> 
> + else if (stmt->action == AP_SetColumns)
> + {
> + Assert(schemaidlist == NIL);
> + Assert(list_length(tables) == 1);
> +
> + PublicationSetColumns(stmt, pubform,
> +   linitial_node(PublicationTable, tables));
> + }
> 
> (Same as my earlier review comment #4)
> 
> Suggest to call this PublicationSetColumns based on some smarter
> detection logic of a changed column list. Please refer to the
> Row-Filter patch [1] for this same function.
> 

I don't understand. Comment #4 is about syntax, no?

> ~~~
> 
> 15. src/backend/commands/publicationcmds.c - AlterPublicationTables
> 
> + /* This is not needed to delete a table */
> + pubrel->columns = NIL;
> 
> Perhaps a more explanatory comment would be better there?
> 

If I understand the comment, it says we don't actually need to set 
columns to NIL. In which case we can just get rid of the change.

> ~~~
> 
> 16. src/backend/commands/tablecmds.c - relation_mark_replica_identity
> 
> @@ -15841,6 +15871,7 @@ relation_mark_replica_identity(Relation rel,
> char ri_type, Oid indexOid,
>    CatalogTupleUpdate(pg_index, &pg_index_tuple->t_self, pg_index_tuple);
>    InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
>    InvalidOid, is_internal);
> +
>    /*
>    * Invalidate the relcache for the table, so that after we commit
>    * all sessions will refresh the table's replica identity index
> 
> Spurious whitespace change seemed unrelated to the Col-Filter patch.
> 

Fixed.

> ~~~
> 
> 17. src/backend/parser/gram.y
> 
>    *
> + * ALTER PUBLICATION name SET COLUMNS table_name (column[, ...])
> + * ALTER PUBLICATION name SET COLUMNS table_name ALL
> + *
> 
> (Same as my earlier review comment #4)
> 
> IMO there was no need for the new syntax of SET COLUMNS.
> 

Not modified yet, we'll see about the syntax.

> ~~~
> 
> 18. src/backend/replication/logical/proto.c - logicalrep_write_attrs
> 
> - /* send number of live attributes */
> - for (i = 0; i < desc->natts; i++)
> - {
> - if (TupleDescAttr(desc, i)->attisdropped || TupleDescAttr(desc,
> i)->attgenerated)
> - continue;
> - nliveatts++;
> - }
> - pq_sendint16(out, nliveatts);
> -
>    /* fetch bitmap of REPLICATION IDENTITY attributes */
>    replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
>    if (!replidentfull)
>    idattrs = RelationGetIdentityKeyBitmap(rel);
> 
> + /* send number of live attributes */
> + for (i = 0; i < desc->natts; i++)
> + {
> + Form_pg_attribute att = TupleDescAttr(desc, i);
> +
> + if (att->attisdropped || att->attgenerated)
> + continue;
> + if (columns != NULL && !bms_is_member(att->attnum, columns))
> + continue;
> + nliveatts++;
> + }
> + pq_sendint16(out, nliveatts);
> +
> 
> This change seemed to have the effect of moving that 4 lines of
> "replidentfull" code from below the loop to above the loop. But moving
> that code seems unrelated to the Col-Filter patch. (??).
> 

Right, restored the original code.

> ~~~
> 
> 19. src/backend/replication/logical/tablesync.c - fetch_remote_table_info
> 
> @@ -793,12 +877,12 @@ fetch_remote_table_info(char *nspname, char *relname,
> 
>    ExecClearTuple(slot);
>    }
> +
>    ExecDropSingleTupleTableSlot(slot);
> -
> - lrel->natts = natt;
> -
>    walrcv_clear_result(res);
>    pfree(cmd.data);
> +
> + lrel->natts = natt;
>   }
> 
> The shuffling of those few lines seems unrelated to any requirement of
> the Col-Filter patch (??)
> 

Yep, undone. I'd bet this is simply due to older versions of the patch 
touching this place, and then undoing some of it.

> ~~~
> 
> 20. src/backend/replication/logical/tablesync.c - copy_table
> 
> + for (int i = 0; i < lrel.natts; i++)
> + {
> + appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> + if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, ", ");
> + }
> 
> Perhaps that could be expressed more simply if the other way around like:
> 
> for (int i = 0; i < lrel.natts; i++)
> {
> if (i)
> appendStringInfoString(&cmd, ", ");
> appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> }
> 

I used a slightly different version.

> ~~~
> 
> 21. src/backend/replication/pgoutput/pgoutput.c
> 
> +
> + /*
> + * Set of columns included in the publication, or NULL if all columns are
> + * included implicitly.  Note that the attnums in this list are not
> + * shifted by FirstLowInvalidHeapAttributeNumber.
> + */
> + Bitmapset  *columns;
> 
> Typo: "in this list" --> "in this set" (??)
> 

"bitmap" is what we call Bitmapset so I used that.

> ~~~
> 
> 22. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
> 
>    * Don't publish changes for partitioned tables, because
> - * publishing those of its partitions suffices, unless partition
> - * changes won't be published due to pubviaroot being set.
> + * publishing those of its partitions suffices.  (However, ignore
> + * this if partition changes are not to published due to
> + * pubviaroot being set.)
>    */
> 
> This change seems unrelated to the Col-Filter patch, so perhaps it
> should not be here at all.
> 
> Also, typo: "are not to published"
> 

Yeah, unrelated. Reverted.

> ~~~
> 
> 23. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
> 
> + /*
> + * Obtain columns published by this publication, and add them
> + * to the list for this rel.  Note that if at least one
> + * publication has a empty column list, that means to publish
> + * everything; so if we saw a publication that includes all
> + * columns, skip this.
> + */
> 
> Typo: "a empty" --> "an empty"
> 

Fixed.

> ~~~
> 
> 24. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
> 
> + if (isnull)
> + {
> + /*
> + * If we see a publication with no columns, reset the
> + * list and ignore further ones.
> + */
> 
> Perhaps that comment is meant to say "with no column filter" instead
> of "with no columns"?
> 

Yep, fixed.

> ~~~
> 
> 25. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
> 
> + if (isnull)
> + {
> ...
> + }
> + else if (!isnull)
> + {
> ...
> + }
> 
> Is the "if (!isnull)" in the else just to be really REALLY sure it is not null?
> 

Double-tap ;-) Removed the condition.

> ~~~
> 
> 26. src/bin/pg_dump/pg_dump.c - getPublicationTables
> 
> + pubrinfo[i].pubrattrs = attribs->data;
> + }
> + else
> + pubrinfo[j].pubrattrs = NULL;
> 
> I got confused reading this code. Are those different indices 'i' and
> 'j' correct?
> 

Good catch! I think you're right and it should be "j" in both places. 
This'd only cause trouble in selective pg_dumps (when dumping selected 
tables). The patch clearly needs some pg_dump tests.

> ~~~
> 
> 27. src/bin/psql/describe.c
> 
> The Row-Filter [1] displays filter information not only for the psql
> \dRp+ command but also for the psql \d <tablename> command. Perhaps
> the Col-Filter patch should do that too.
> 

Not sure.

> ~~~
> 
> 28. src/bin/psql/tab-complete.c
> 
> @@ -1657,6 +1657,8 @@ psql_completion(const char *text, int start, int end)
>    /* ALTER PUBLICATION <name> ADD */
>    else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD"))
>    COMPLETE_WITH("ALL TABLES IN SCHEMA", "TABLE");
> + else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD", "TABLE"))
> + COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
>    /* ALTER PUBLICATION <name> DROP */
> 
> I am not sure about this one- is that change even related to the
> Col-Filter patch or is this some unrelated bugfix?
> 

Yeah, seems unrelated - possibly from a rebase or something. Removed.

> ~~~
> 
> 29. src/include/catalog/pg_publication.h
> 
> @@ -86,6 +86,7 @@ typedef struct Publication
>   typedef struct PublicationRelInfo
>   {
>    Relation relation;
> + List    *columns;
>   } PublicationRelInfo;
> 
> Perhaps that needs some comment. e.g. do you need to mention that a
> NIL List means all columns?
> 

I added a short comment.

> ~~~
> 
> 30. src/include/nodes/parsenodes.h
> 
> @@ -3642,6 +3642,7 @@ typedef struct PublicationTable
>   {
>    NodeTag type;
>    RangeVar   *relation; /* relation to be published */
> + List    *columns; /* List of columns in a publication table */
>   } PublicationTable;
> 
> 
> That comment "List of columns in a publication table" doesn't really
> say anything helpful.
> 
> Perhaps it should mention that a NIL List means all table columns?
> 

Not sure, seems fine.

> ~~~
> 
> 31. src/test/regress/sql/publication.sql
> 
> The regression test file has an uncommon mixture of /* */ and -- style comments.
> 
> Perhaps change all the /* */ ones?
> 

Yeah, that needs some cleanup. I haven't done anything about it yet.

> ~~~
> 
> 32. src/test/regress/sql/publication.sql
> 
> +CREATE TABLE testpub_tbl5 (a int PRIMARY KEY, b text, c text,
> + d int generated always as (a + length(b)) stored);
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, x);  -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (b, c);  -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);  -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c);  -- ok
> 
> For all these tests (and more) there seems not sufficient explanation
> comments to say exactly what each test case is testing, e.g. *why* is
> an "error" expected for some cases but "ok" for others.
> 

Not sure. I think the error is generally obvious in the expected output.

> ~~~
> 
> 33. src/test/regress/sql/publication.sql
> 
> "-- no dice"
> 
> (??) confusing comment.
> 

Same as for the errors.

> ~~~
> 
> 34. src/test/subscription/t/028_column_list.pl
> 
> I think a few more comments in this TAP file would help to make the
> purpose of the tests more clear.
> 

Yeah, the 0004 patch I shared a couple minutes ago does exactly that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl
Next
From: Nathan Bossart
Date:
Subject: USE_BARRIER_SMGRRELEASE on Linux?