Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Column Filtering in Logical Replication
Date
Msg-id 202112180134.l3sxhe27tlws@alvherre.pgsql
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Column Filtering in Logical Replication
Re: Column Filtering in Logical Replication
List pgsql-hackers
On 2021-Dec-17, Tomas Vondra wrote:

> On 12/17/21 22:07, Alvaro Herrera wrote:
> > So I've been thinking about this as a "security" item (you can see my
> > comments to that effect sprinkled all over this thread), in the sense
> > that if a publication "hides" some column, then the replica just won't
> > get access to it.  But in reality that's mistaken: the filtering that
> > this patch implements is done based on the queries that *the replica*
> > executes at its own volition; if the replica decides to ignore the list
> > of columns, it'll be able to get all columns.  All it takes is an
> > uncooperative replica in order for the lot of data to be exposed anyway.
> 
> Interesting, I haven't really looked at this as a security feature. And in
> my experience if something is not carefully designed to be secure from the
> get go, it's really hard to add that bit later ...

I guess the way to really harden replication is to use the GRANT system
at the publisher's side to restrict access for the replication user.
This would provide actual security.  So you're right that I seem to be
barking at the wrong tree ...  maybe I need to give a careful look at
the documentation for logical replication to understand what is being
offered, and to make sure that we explicitly indicate that limiting the
column list does not provide any actual security.

> You say it's the replica making the decisions, but my mental model is it's
> the publisher decoding the data for a given list of publications (which
> indeed is specified by the subscriber). But the subscriber can't tweak the
> definition of publications, right? Or what do you mean by queries executed
> by the replica? What are the gap?

I am thinking in somebody modifying the code that the replica runs, so
that it ignores the column list that the publication has been configured
to provide; instead of querying only those columns, it would query all
columns.

> > If the server has a *separate* security mechanism to hide the columns
> > (per-column privs), it is that feature that will protect the data, not
> > the logical-replication-feature to filter out columns.
> 
> Right. Although I haven't thought about how logical decoding interacts with
> column privileges. I don't think logical decoding actually checks column
> privileges - I certainly don't recall any ACL checks in
> src/backend/replication ...

Well, in practice if you're confronted with a replica that's controlled
by a malicious user that can tweak its behavior, then replica-side
privilege checking won't do anything useful.

> > This led me to realize that the replica-side code in tablesync.c is
> > totally oblivious to what's the publication through which a table is
> > being received from in the replica.  So we're not aware of a replica
> > being exposed only a subset of columns through some specific
> > publication; and a lot more hacking is needed than this patch does, in
> > order to be aware of which publications are being used.

> Does that mean we currently sync all the columns in the initial sync, and
> only start filtering columns later while decoding transactions?

No, it does filter the list of columns in the initial sync.  But the
current implementation is bogus, because it obtains the list of *all*
publications in which the table is published, not just the ones that the
subscription is configured to get data from.  And the sync code doesn't
receive the list of publications.  We need more thorough patching of the
sync code to close that hole.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: WAL prefetch (another approach)
Next
From: Tomas Vondra
Date:
Subject: sequences vs. synchronous replication