Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Rahila Syed
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CAH2L28stHzf6Yg3KxBEJ1JEr8C6xM4oSZfysdUObauZ00ArdWQ@mail.gmail.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

 
>  
> Currently, this capability is not included in the patch. If the table on
> the subscriber
> server has lesser attributes than that on the publisher server, it
> throws an error at the 
> time of CREATE SUBSCRIPTION.
>

That's a bit surprising, to be honest. I do understand the patch simply
treats the filtered columns as "unchanged" because that's the simplest
way to filter the *data* of the columns. But if someone told me we can
"filter columns" I'd expect this to work without the columns on the
subscriber.

OK, I will look into adding this. 

This has been added in the attached patch. Now, instead of 
treating the filtered columns as unchanged and sending a byte
with that information, unfiltered columns are not sent to the subscriber 
server at all. This along with saving the network bandwidth, allows 
the logical replication to even work between tables with different numbers of 
columns i.e with the table on subscriber server containing only the filtered 
columns. Currently, replica identity columns are replicated irrespective of 
the presence of the column filters, hence the table on the subscriber side  must 
contain the replica identity columns.

The patch adds a new parse node PublicationTable, but doesn't add
copyfuncs.c, equalfuncs.c, readfuncs.c, outfuncs.c support for it.

Thanks, added this. 
 
Looking at OpenTableList(), I think you forgot to update the comment --
it says "open relations specified by a RangeVar list", 

Thank you for the review, Modified this.

To nitpick, I find "Bitmapset *att_list" a bit annoying, because it's
not really a list ;-)

Changed this.

 It's not super clear to me that strlist_to_textarray() and related
processing will behave sanely when the column names contain weird
characters such as commas or quotes, or just when used with uppercase
column names.  Maybe it's worth having tests that try to break such
cases.
 
Added a few test cases for this. 

In AlterPublicationTables() I was confused by some code that seemed
commented a bit too verbosely 
 
Modified this as per the suggestion.

: REPLICA IDENTITY columns are always replicated
: irrespective of column names specification.

... for which you don't have any tests 

I have added these tests.

Having said that, I'm not sure I agree with this design decision; what I
think this is doing is hiding from the user the fact that they are
publishing columns that they don't want to publish.  I think as a user I
would rather get an error in that case:

  ERROR:  invalid column list in published set
  DETAIL:  The set of published commands does not include all the replica identity columns.

or something like that.  Avoid possible nasty surprises of security-
leaking nature.

Ok, Thank you for your opinion. I agree that giving an explicit error in this case will be safer.  
I will include this, in case there are no counter views.

Thank you for your review comments. Please find attached the rebased and updated patch. 
 

Thank you,
Rahila Syed 

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Next
From: Tom Lane
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures