Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 670fbd5b-5fc3-8c30-a7bf-ad1d6ea2d4a5@enterprisedb.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Column Filtering in Logical Replication
List pgsql-hackers

On 12/18/21 02:34, Alvaro Herrera wrote:
> On 2021-Dec-17, Tomas Vondra wrote:
> 
>> On 12/17/21 22:07, Alvaro Herrera wrote:
>>> So I've been thinking about this as a "security" item (you can see my
>>> comments to that effect sprinkled all over this thread), in the sense
>>> that if a publication "hides" some column, then the replica just won't
>>> get access to it.  But in reality that's mistaken: the filtering that
>>> this patch implements is done based on the queries that *the replica*
>>> executes at its own volition; if the replica decides to ignore the list
>>> of columns, it'll be able to get all columns.  All it takes is an
>>> uncooperative replica in order for the lot of data to be exposed anyway.
>>
>> Interesting, I haven't really looked at this as a security feature. And in
>> my experience if something is not carefully designed to be secure from the
>> get go, it's really hard to add that bit later ...
> 
> I guess the way to really harden replication is to use the GRANT system
> at the publisher's side to restrict access for the replication user.
> This would provide actual security.  So you're right that I seem to be
> barking at the wrong tree ...  maybe I need to give a careful look at
> the documentation for logical replication to understand what is being
> offered, and to make sure that we explicitly indicate that limiting the
> column list does not provide any actual security.
> 
>> You say it's the replica making the decisions, but my mental model is it's
>> the publisher decoding the data for a given list of publications (which
>> indeed is specified by the subscriber). But the subscriber can't tweak the
>> definition of publications, right? Or what do you mean by queries executed
>> by the replica? What are the gap?
> 
> I am thinking in somebody modifying the code that the replica runs, so
> that it ignores the column list that the publication has been configured
> to provide; instead of querying only those columns, it would query all
> columns.
> 
>>> If the server has a *separate* security mechanism to hide the columns
>>> (per-column privs), it is that feature that will protect the data, not
>>> the logical-replication-feature to filter out columns.
>>
>> Right. Although I haven't thought about how logical decoding interacts with
>> column privileges. I don't think logical decoding actually checks column
>> privileges - I certainly don't recall any ACL checks in
>> src/backend/replication ...
> 
> Well, in practice if you're confronted with a replica that's controlled
> by a malicious user that can tweak its behavior, then replica-side
> privilege checking won't do anything useful.
> 

I don't follow. Surely the decoding happens on the primary node, right? 
Which is where the ACL checks would happen, using the role the 
replication connection is opened with.

>>> This led me to realize that the replica-side code in tablesync.c is
>>> totally oblivious to what's the publication through which a table is
>>> being received from in the replica.  So we're not aware of a replica
>>> being exposed only a subset of columns through some specific
>>> publication; and a lot more hacking is needed than this patch does, in
>>> order to be aware of which publications are being used.
> 
>> Does that mean we currently sync all the columns in the initial sync, and
>> only start filtering columns later while decoding transactions?
> 
> No, it does filter the list of columns in the initial sync.  But the
> current implementation is bogus, because it obtains the list of *all*
> publications in which the table is published, not just the ones that the
> subscription is configured to get data from.  And the sync code doesn't
> receive the list of publications.  We need more thorough patching of the
> sync code to close that hole.

Ah, got it. Thanks for the explanation. Yeah, that makes no sense.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences
Next
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations