Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Column Filtering in Logical Replication |
Date | |
Msg-id | 670fbd5b-5fc3-8c30-a7bf-ad1d6ea2d4a5@enterprisedb.com Whole thread Raw |
In response to | Re: Column Filtering in Logical Replication (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: Column Filtering in Logical Replication
|
List | pgsql-hackers |
On 12/18/21 02:34, Alvaro Herrera wrote: > On 2021-Dec-17, Tomas Vondra wrote: > >> On 12/17/21 22:07, Alvaro Herrera wrote: >>> So I've been thinking about this as a "security" item (you can see my >>> comments to that effect sprinkled all over this thread), in the sense >>> that if a publication "hides" some column, then the replica just won't >>> get access to it. But in reality that's mistaken: the filtering that >>> this patch implements is done based on the queries that *the replica* >>> executes at its own volition; if the replica decides to ignore the list >>> of columns, it'll be able to get all columns. All it takes is an >>> uncooperative replica in order for the lot of data to be exposed anyway. >> >> Interesting, I haven't really looked at this as a security feature. And in >> my experience if something is not carefully designed to be secure from the >> get go, it's really hard to add that bit later ... > > I guess the way to really harden replication is to use the GRANT system > at the publisher's side to restrict access for the replication user. > This would provide actual security. So you're right that I seem to be > barking at the wrong tree ... maybe I need to give a careful look at > the documentation for logical replication to understand what is being > offered, and to make sure that we explicitly indicate that limiting the > column list does not provide any actual security. > >> You say it's the replica making the decisions, but my mental model is it's >> the publisher decoding the data for a given list of publications (which >> indeed is specified by the subscriber). But the subscriber can't tweak the >> definition of publications, right? Or what do you mean by queries executed >> by the replica? What are the gap? > > I am thinking in somebody modifying the code that the replica runs, so > that it ignores the column list that the publication has been configured > to provide; instead of querying only those columns, it would query all > columns. > >>> If the server has a *separate* security mechanism to hide the columns >>> (per-column privs), it is that feature that will protect the data, not >>> the logical-replication-feature to filter out columns. >> >> Right. Although I haven't thought about how logical decoding interacts with >> column privileges. I don't think logical decoding actually checks column >> privileges - I certainly don't recall any ACL checks in >> src/backend/replication ... > > Well, in practice if you're confronted with a replica that's controlled > by a malicious user that can tweak its behavior, then replica-side > privilege checking won't do anything useful. > I don't follow. Surely the decoding happens on the primary node, right? Which is where the ACL checks would happen, using the role the replication connection is opened with. >>> This led me to realize that the replica-side code in tablesync.c is >>> totally oblivious to what's the publication through which a table is >>> being received from in the replica. So we're not aware of a replica >>> being exposed only a subset of columns through some specific >>> publication; and a lot more hacking is needed than this patch does, in >>> order to be aware of which publications are being used. > >> Does that mean we currently sync all the columns in the initial sync, and >> only start filtering columns later while decoding transactions? > > No, it does filter the list of columns in the initial sync. But the > current implementation is bogus, because it obtains the list of *all* > publications in which the table is published, not just the ones that the > subscription is configured to get data from. And the sync code doesn't > receive the list of publications. We need more thorough patching of the > sync code to close that hole. Ah, got it. Thanks for the explanation. Yeah, that makes no sense. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: