Thread: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
Hi hackers,
I'd like to propose an enhancement to PostgreSQL's logical replication system: Deferred Replica Filtering (DRF). The goal of this feature is to provide more granular control over which rows are replicated by applying publication filters after the WAL decoding process, before sending data to subscribers.
Currently, PostgreSQL's logical replication filters apply deterministically. Deferred filtering, however, operates after the WAL has been decoded, giving it access to the complete row data and making filtering decisions based on mutable values. Additionally, record columns may be omitted by the filter.
This opens up several possibilities for granular control. Consider the following examples:
Alice and Bob subscribe to changes on a table with RLS enabled, allowing CRUD operations based on user's IDs.
1. Alice needs to know the timestamp at which Bob updated the table. With DRF, we can omit all columns except for the timestamp.
2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all columns on any deleted row, potentially exposing complete records he shouldn't be authorized to view. DRF can filter these rows out.
Deferred replica filtering allows for session-specific, per-row, and per-column filtering - features currently not supported by existing replication filters, enhancing security and data privacy.
I look forward to hearing your thoughts!
Best,
Dean S
On Sun, Mar 16, 2025 at 12:59 AM Dean <ds.blue797@gmail.com> wrote: > > I'd like to propose an enhancement to PostgreSQL's logical replication system: Deferred Replica Filtering (DRF). The goalof this feature is to provide more granular control over which rows are replicated by applying publication filters afterthe WAL decoding process, before sending data to subscribers. > > Currently, PostgreSQL's logical replication filters apply deterministically. Deferred filtering, however, operates afterthe WAL has been decoded, giving it access to the complete row data and making filtering decisions based on mutablevalues. Additionally, record columns may be omitted by the filter. > > This opens up several possibilities for granular control. Consider the following examples: > Alice and Bob subscribe to changes on a table with RLS enabled, allowing CRUD operations based on user's IDs. > 1. Alice needs to know the timestamp at which Bob updated the table. With DRF, we can omit all columns except for the timestamp. > 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all columns on any deleted row, potentially exposingcomplete records he shouldn't be authorized to view. DRF can filter these rows out. > > Deferred replica filtering allows for session-specific, per-row, and per-column filtering - features currently not supportedby existing replication filters, enhancing security and data privacy. > We provide column lists [1] and row filters [2]. Doesn't that suffice the need, if not, kindly let us know what exactly you need with some examples. [1] - https://www.postgresql.org/docs/devel/logical-replication-col-lists.html [2] - https://www.postgresql.org/docs/devel/logical-replication-row-filter.html -- With Regards, Amit Kapila.
Unfortunately, neither column lists nor row filters can provide the level of control I'm proposing. These revised examples might help illustrate the use case for DRF:
Alice, Bob, and Eve subscribe to changes on a `friend_requests` table. Row-level security ensures CRUD access based on user IDs.
1. Per-subscriber column control: Bob makes a change to the table. Alice should receive the entire record, while Eve should only receive the timestamp - no other columns. Why DRF is needed: Column lists are static and apply equally to all subscribers, meaning we can't distinguish Alice's subscription from Eve's.
2. Bob DELETEs a row from the table. Alice should see the DELETE event, while Eve should not even be aware of an event. Why DRF is needed: The deterministic nature of row filters makes them unsuitable for per-subscriber filtering based on session data.
The goal of DRF is to allow per-subscriber variations in change broadcasts, enabling granular control over what data is sent to each subscriber based on their session context.
Best,
Dean S
On Mon, Mar 17, 2025 at 4:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 16, 2025 at 12:59 AM Dean <ds.blue797@gmail.com> wrote:
>
> I'd like to propose an enhancement to PostgreSQL's logical replication system: Deferred Replica Filtering (DRF). The goal of this feature is to provide more granular control over which rows are replicated by applying publication filters after the WAL decoding process, before sending data to subscribers.
>
> Currently, PostgreSQL's logical replication filters apply deterministically. Deferred filtering, however, operates after the WAL has been decoded, giving it access to the complete row data and making filtering decisions based on mutable values. Additionally, record columns may be omitted by the filter.
>
> This opens up several possibilities for granular control. Consider the following examples:
> Alice and Bob subscribe to changes on a table with RLS enabled, allowing CRUD operations based on user's IDs.
> 1. Alice needs to know the timestamp at which Bob updated the table. With DRF, we can omit all columns except for the timestamp.
> 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all columns on any deleted row, potentially exposing complete records he shouldn't be authorized to view. DRF can filter these rows out.
>
> Deferred replica filtering allows for session-specific, per-row, and per-column filtering - features currently not supported by existing replication filters, enhancing security and data privacy.
>
We provide column lists [1] and row filters [2]. Doesn't that suffice
the need, if not, kindly let us know what exactly you need with some
examples.
[1] - https://www.postgresql.org/docs/devel/logical-replication-col-lists.html
[2] - https://www.postgresql.org/docs/devel/logical-replication-row-filter.html
--
With Regards,
Amit Kapila.
RE: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
From
"Zhijie Hou (Fujitsu)"
Date:
On 2025/03/19 10:00:00 </o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=3da34e28d9d240d4abadbbb549f9ff21-houzj.fnst>wrote: On Wed, Mar 19, 2025 at 8:56 AM Dean wrote: > Unfortunately, neither column lists nor row filters can provide the level of > control I'm proposing. These revised examples might help illustrate the use > case for DRF: > > Alice, Bob, and Eve subscribe to changes on a `friend_requests` table. > Row-level security ensures CRUD access based on user IDs. > > 1. Per-subscriber column control: Bob makes a change to the table. Alice > should receive the entire record, while Eve should only receive the > timestamp - no other columns. Why DRF is needed: Column lists are static > and apply equally to all subscribers, meaning we can't distinguish Alice's > subscription from Eve's. I would like to confirm the concept to ensure I understand it correctly. If your goal is for the two subscriptions, Alice and Eve, to utilize different column lists, you can achieve this by creating two separate publications: Alice_pub and Eve_pub. You can specify the desired column list for the table in Eve_pub. Consequently, the subscription Alice will subscribe to Alice_pub, receiving the full record, while Eve will subscribe to Eve_pub, receiving a limited set of columns. If this does not fully meet your requirements, could you please provide additional details or clarify your specific needs ? > 2. Bob DELETEs a row from the table. Alice should see the DELETE event, while > Eve should not even be aware of an event. Why DRF is needed: The > deterministic nature of row filters makes them unsuitable for > per-subscriber filtering based on session data. > > > The goal of DRF is to allow per-subscriber variations in change broadcasts, > enabling granular control over what data is sent to each subscriber based on > their session context. Same as above, it would be helpful to elaborate a bit more. Best Regards, Hou zj
Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
From
"Euler Taveira"
Date:
On Tue, Mar 18, 2025, at 9:56 PM, Dean wrote:
Unfortunately, neither column lists nor row filters can provide the level of control I'm proposing. These revised examples might help illustrate the use case for DRF:
I'm afraid I didn't understand your proposal. Are you trying to use logical
replication with RLS enabled on subscriber?
Alice, Bob, and Eve subscribe to changes on a `friend_requests` table. Row-level security ensures CRUD access based on user IDs.1. Per-subscriber column control: Bob makes a change to the table. Alice should receive the entire record, while Eve should only receive the timestamp - no other columns. Why DRF is needed: Column lists are static and apply equally to all subscribers, meaning we can't distinguish Alice's subscription from Eve's.2. Bob DELETEs a row from the table. Alice should see the DELETE event, while Eve should not even be aware of an event. Why DRF is needed: The deterministic nature of row filters makes them unsuitable for per-subscriber filtering based on session data.The goal of DRF is to allow per-subscriber variations in change broadcasts, enabling granular control over what data is sent to each subscriber based on their session context.
You misunderstood the logical replication architecture. The filtering is
applied *after* the WAL is decoded. See change_cb -- pgoutput_change().
You mentioned RLS but AFAICS it cannot replicate or do an initial
synchronization to a table if RLS is enabled.
See TargetPrivilegesCheck() -- worker.c.
/*
* We lack the infrastructure to honor RLS policies. It might be possible
* to add such infrastructure here, but tablesync workers lack it, too, so
* we don't bother. RLS does not ordinarily apply to TRUNCATE commands,
* but it seems dangerous to replicate a TRUNCATE and then refuse to
* replicate subsequent INSERTs, so we forbid all commands the same.
*/
if (check_enable_rls(relid, InvalidOid, false) == RLS_ENABLED)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user \"%s\" cannot replicate into relation with row-level security enabled: \"%s\"",
GetUserNameFromId(GetUserId(), true),
RelationGetRelationName(rel))));
See LogicalRepSyncTableStart() -- tablesync.c.
/*
* COPY FROM does not honor RLS policies. That is not a problem for
* subscriptions owned by roles with BYPASSRLS privilege (or superuser,
* who has it implicitly), but other roles should not be able to
* circumvent RLS. Disallow logical replication into RLS enabled
* relations for such roles.
*/
if (check_enable_rls(RelationGetRelid(rel), InvalidOid, false) == RLS_ENABLED)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user \"%s\" cannot replicate into relation with row-level security enabled: \"%s\"",
GetUserNameFromId(GetUserId(), true),
RelationGetRelationName(rel))));
The comments already point out directions. Feel free to write a proposal for
it.
Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
From
"David G. Johnston"
Date:
On Sat, Mar 15, 2025 at 12:29 PM Dean <ds.blue797@gmail.com> wrote:
Currently, PostgreSQL's logical replication filters apply deterministically. Deferred filtering, however, operates after the WAL has been decoded, giving it access to the complete row data and making filtering decisions based on mutable values. Additionally, record columns may be omitted by the filter.Deferred replica filtering allows for session-specific, per-row, and per-column filtering - features currently not supported by existing replication filters, enhancing security and data privacy.
We haven't even overcome this yet:
There are currently no privileges on publications. Any subscription (that is able to connect) can access any publication. Thus, if you intend to hide some information from particular subscribers, such as by using row filters or column lists, or by not adding the whole table to the publication, be aware that other publications in the same database could expose the same information. Publication privileges might be added to PostgreSQL in the future to allow for finer-grained access control.
You seem to be quite a few steps ahead.
That said:
The role used for the replication connection must have the REPLICATION attribute (or be a superuser). If the role lacks SUPERUSER and BYPASSRLS, publisher row security policies can execute. If the role does not trust all table owners, include options=-crow_security=off in the connection string; if a table owner then adds a row security policy, that setting will cause replication to halt rather than execute the policy. Access for the role must be configured in pg_hba.conf and it must have the LOGIN attribute.
So RLS is taken into consideration by the publication when sending changes to a subscription. Though I wouldn't be surprised if there are edge cases that could be worked on.
David J.