Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Rahila Syed
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CAH2L28t7GP7bLnoFa_uS_k9wfX7R2mqPz1-CtZBfJOOG+CM8aQ@mail.gmail.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Peter Smith <smithpb2250@gmail.com>)
Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Hi,


Another point is what if someone drops the column used in one of the
publications? Do we want to drop the entire relation from publication
or just remove the column filter or something else?

After thinking about this, I think it is best to remove the entire table from publication,
if a column specified in the column filter is dropped from the table. 
Because, if we drop the entire filter without dropping the table, it means all the columns will be replicated,
and the downstream server table might not have those columns.

If we drop only the column from the filter we might have to recreate the filter and check for replica identity.
That means if the replica identity column is dropped, you can't drop it from the filter,
and might have to drop the entire publication-table mapping anyways. 

Thus, I think it is cleanest to drop the entire relation from publication.

This has been implemented in the attached version.
 
Do we want to consider that the columns specified in the filter must
not have NOT NULL constraint? Because, otherwise, the subscriber will
error out inserting such rows?

I think you mean columns *not* specified in the filter must not have NOT NULL constraint
on the subscriber, as this will break during insert, as it will try to insert NULL for columns
not sent by the publisher.
I will look into fixing this. Probably this won't be a problem in
case the column is auto generated or contains a default value.
 
I am not sure if this needs to be handled. Ideally, we need to prevent the subscriber tables from having a NOT NULL
constraint if the publisher uses column filters to publish the values of the table. There is no way
to do this at the time of creating a table on subscriber.
As this involves querying the publisher for this information, it can be done at the time of initial table synchronization.
i.e error out if any of the subscribed tables has NOT NULL constraint on non-filter columns. 
This will lead to the user dropping and recreating the subscription after removing the
NOT NULL constraint from the table.
I think the same can be achieved by doing nothing and letting the subscriber error out while inserting rows.

Minor comments:
================
  pq_sendbyte(out, flags);
-
  /* attribute name */
  pq_sendstring(out, NameStr(att->attname));

@@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)

  /* attribute mode */
  pq_sendint32(out, att->atttypmod);
+
  }

  bms_free(idattrs);
diff --git a/src/backend/replication/logical/relation.c
b/src/backend/replication/logical/relation.c
index c37e2a7e29..d7a7b00841 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -354,7 +354,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)

  attnum = logicalrep_rel_att_by_name(remoterel,
  NameStr(attr->attname));
-
  entry->attrmap->attnums[i] = attnum;

There are quite a few places in the patch that contains spurious line
additions or removals.


Fixed these in the attached patch.

Having said that, I'm not sure I agree with this design decision; what I
think this is doing is hiding from the user the fact that they are
publishing columns that they don't want to publish.  I think as a user I
would rather get an error in that case:

  ERROR:  invalid column list in published set
  DETAIL:  The set of published commands does not include all the replica identity columns.

Added this.

Also added some more tests. Please find attached a rebased and updated patch. 
 
Thank you,
Rahila Syed
Attachment

pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: [PATCH] Support pg_ident mapping for LDAP
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Support pg_ident mapping for LDAP