Thread: Feature Recommendations for Logical Subscriptions
Business Scenario:
The BI department requires real-time data from the operational database. In our current approach (on platform 14), we create a separate database within our department's real-time backup instance, set up a logical replication account, replicate required tables to this isolated database via logical replication, and then create a dedicated account with column-level permissions on specific tables for the BI department.
Recommendations:
1、Column Filtering Functionality: During implementation, some tables may contain sensitive data or long fields (e.g., text columns), while other fields in these tables still need to be replicated to another database or instance. Manually specifying individual columns becomes cumbersome, especially for tables with many fields, and complicates future field additions. We recommend adding a column filtering feature to logical replication to streamline this process.
2、Logical Replication Account Permissions:
Logical replication permissions should be decoupled from general database access permissions.
Proposed workflow:
Create a dedicated account with logical replication privileges only.
Create a logical replication slot and grant this account access only to the authorized replication slot.
This account would have no direct access to the database itself but could subscribe to and consume data from the permitted replication slot.
This approach allows securely providing the account to the BI department. They can subscribe to the replication slot and perform downstream processing independently, without risking exposure of unauthorized data.
On Tue, Apr 8, 2025 at 2:48 PM YeXiu <1518981153@qq.com> wrote: > > Business Scenario: > The BI department requires real-time data from the operational database. In our current approach (on platform 14), we createa separate database within our department's real-time backup instance, set up a logical replication account, replicaterequired tables to this isolated database via logical replication, and then create a dedicated account with column-levelpermissions on specific tables for the BI department. > > Recommendations: > > 1、Column Filtering Functionality: During implementation, some tables may contain sensitive data or long fields (e.g.,text columns), while other fields in these tables still need to be replicated to another database or instance. Manuallyspecifying individual columns becomes cumbersome, especially for tables with many fields, and complicates futurefield additions. We recommend adding a column filtering feature to logical replication to streamline this process. > It would have been better if you could provide some examples. Let me try to describe by example. We have a feature where users can specify columns they want to replicate. For example: Create a publication that publishes all changes for table users, but replicates only columns user_id and firstname: CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname); As per my understanding, you are expecting a feature where we can specify columns that won't be replicated. For example say a table t1 has columns c1, c2, c3, ..., c10. Now, the user would like to replicate all columns except c9 and c10, so he should be allowed to do so by something like CREATE PUBLICATION users_filtered FOR TABLE t1 Except (c9, c10). Is that correct or you have something else in mind? > > 2、Logical Replication Account Permissions: > Logical replication permissions should be decoupled from general database access permissions. > Proposed workflow: > Create a dedicated account with logical replication privileges only. > Create a logical replication slot and grant this account access only to the authorized replication slot. > This account would have no direct access to the database itself but could subscribe to and consume data from the permittedreplication slot. > This approach allows securely providing the account to the BI department. They can subscribe to the replication slot andperform downstream processing independently, without risking exposure of unauthorized data. > We need to access catalog tables in the database while decoding changes, so won't some interaction with database privileges still be required? BTW, are you planning to work on a patch on these proposals or you expect someone else in the community to work on these proposals? -- With Regards, Amit Kapila.
For example:
Assume database db1 has a user table with columns c1, c2, c3, ..., c10, telphone, and content (where telphone is a sensitive data field, and content is of type text).
1.We need to synchronize the user table to the BI department, but they should not have access to the telphone column due to sensitivity. The content column is also unnecessary for BI as it is too long and lacks analytical value.
During synchronization, we need to exclude both telphone and content columns. However, the user table may continue to add new columns (e.g., c11, c12) in the future.
The current approach is:
CREATE PUBLICATION pub FOR TABLE public.user (c1, c2, c3, ..., c10);
When new columns like c11 or c12 are added, we must manually update the publication:
ALTER PUBLICATION pub SET TABLE public.user (c1, c2, c3, ..., c10, c11, c12);
This repetitive work is inefficient. I suggest using the EXCEPT syntax as you mentioned earlier:
CREATE PUBLICATION pub FOR TABLE public.user EXCEPT (telphone, content);
This would automatically exclude sensitive or unnecessary columns, even as new columns are added. Additionally, we need a method to modify the exclusion list dynamically, such as:
ALTER PUBLICATION pub SET TABLE public.user EXCEPT (telphone);
-- or --
ALTER PUBLICATION pub SET TABLE public.user EXCEPT (telphone, content, c11);
2. Second Issue:
This extends the scenario above. I’m unsure how permissions are currently handled.
You mentioned creating a dedicated account u1 with only logical replication privileges, setting up a replication slot, and granting access to this slot.
My question: If the telphone column is excluded from the publication, does the subscriber still receive or parse data for telphone? Or is the column entirely absent from the replication slot?
If the replication slot does NOT include telphone data, this is a non-issue and can be ignored.
3. I hope others in the community can address these suggestions, as I am not a C developer and cannot implement them myself.
原始邮件
发件人:Amit Kapila <amit.kapila16@gmail.com> 发件时间:2025年4月9日 18:44 收件人:YeXiu <1518981153@qq.com> 抄送:pgsql-hackers <pgsql-hackers@lists.postgresql.org> 主题:Re: Feature Recommendations for Logical Subscriptions |
>
> Business Scenario:
> The BI department requires real-time data from the operational database. In our current approach (on platform 14), we create a separate database within our department's real-time backup instance, set up a logical replication account, replicate required tables to this isolated database via logical replication, and then create a dedicated account with column-level permissions on specific tables for the BI department.
>
> Recommendations:
>
> 1、Column Filtering Functionality: During implementation, some tables may contain sensitive data or long fields (e.g., text columns), while other fields in these tables still need to be replicated to another database or instance. Manually specifying individual columns becomes cumbersome, especially for tables with many fields, and complicates future field additions. We recommend adding a column filtering feature to logical replication to streamline this process.
>
It would have been better if you could provide some examples. Let me
try to describe by example. We have a feature where users can specify
columns they want to replicate. For example: Create a publication that
publishes all changes for table users, but replicates only columns
user_id and firstname:
CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname);
As per my understanding, you are expecting a feature where we can
specify columns that won't be replicated. For example say a table t1
has columns c1, c2, c3, ..., c10. Now, the user would like to
replicate all columns except c9 and c10, so he should be allowed to do
so by something like CREATE PUBLICATION users_filtered FOR TABLE t1
Except (c9, c10). Is that correct or you have something else in mind?
>
> 2、Logical Replication Account Permissions:
> Logical replication permissions should be decoupled from general database access permissions.
> Proposed workflow:
> Create a dedicated account with logical replication privileges only.
> Create a logical replication slot and grant this account access only to the authorized replication slot.
> This account would have no direct access to the database itself but could subscribe to and consume data from the permitted replication slot.
> This approach allows securely providing the account to the BI department. They can subscribe to the replication slot and perform downstream processing independently, without risking exposure of unauthorized data.
>
We need to access catalog tables in the database while decoding
changes, so won't some interaction with database privileges still be
required?
BTW, are you planning to work on a patch on these proposals or you
expect someone else in the community to work on these proposals?
--
With Regards,
Amit Kapila.
On Thu, Apr 10, 2025 at 10:12 AM YeXiu <1518981153@qq.com> wrote:
For example:Assume database db1 has a user table with columns c1, c2, c3, ..., c10, telphone, and content (where telphone is a sensitive data field, and content is of type text).1.We need to synchronize the user table to the BI department, but they should not have access to the telphone column due to sensitivity. The content column is also unnecessary for BI as it is too long and lacks analytical value.During synchronization, we need to exclude both telphone and content columns. However, the user table may continue to add new columns (e.g., c11, c12) in the future.The current approach is:CREATE PUBLICATION pub FOR TABLE public.user (c1, c2, c3, ..., c10);When new columns like c11 or c12 are added, we must manually update the publication:ALTER PUBLICATION pub SET TABLE public.user (c1, c2, c3, ..., c10, c11, c12);This repetitive work is inefficient. I suggest using the EXCEPT syntax as you mentioned earlier:CREATE PUBLICATION pub FOR TABLE public.user EXCEPT (telphone, content);This would automatically exclude sensitive or unnecessary columns, even as new columns are added. Additionally, we need a method to modify the exclusion list dynamically, such as:ALTER PUBLICATION pub SET TABLE public.user EXCEPT (telphone);-- or --ALTER PUBLICATION pub SET TABLE public.user EXCEPT (telphone, content, c11);2. Second Issue:This extends the scenario above. I’m unsure how permissions are currently handled.You mentioned creating a dedicated account u1 with only logical replication privileges, setting up a replication slot, and granting access to this slot.My question: If the telphone column is excluded from the publication, does the subscriber still receive or parse data for telphone? Or is the column entirely absent from the replication slot?If the replication slot does NOT include telphone data, this is a non-issue and can be ignored.
We don't send the telephone data to the subscriber. So the subscriber would never need to parse it.
3. I hope others in the community can address these suggestions, as I am not a C developer and cannot implement them myself.
Okay, we previously discussed this feature, but due to a lack of interest, it has been dropped. You can join that thread [1] and help with testing and feature specifications.
With Regards,
Amit Kapila.
Amit Kapila.
Hi, FYI, the Column List documentation [1] says ------ However, do not rely on this feature for security: a malicious subscriber is able to obtain data from columns that are not specifically published. If security is a consideration, protections can be applied at the publisher side. ------ IIRC, this was something to do with how the COPY done by the initial table sync might be manipulated by a malicious subscriber. I think you can find more details about this in the original thread when Column Lists were introduced. e.g. try searching this [2] thread for the word "security". ====== [1] https://www.postgresql.org/docs/current/logical-replication-col-lists.html [2] https://www.postgresql.org/message-id/flat/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB%3D_ektNRH8NJ1jf95g%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Apr 11, 2025 at 3:40 AM Peter Smith <smithpb2250@gmail.com> wrote: > > FYI, the Column List documentation [1] says > ------ > However, do not rely on this feature for security: a malicious > subscriber is able to obtain data from columns that are not > specifically published. If security is a consideration, protections > can be applied at the publisher side. > ------ > > IIRC, this was something to do with how the COPY done by the initial > table sync might be manipulated by a malicious subscriber. I think you > can find more details about this in the original thread when Column > Lists were introduced. e.g. try searching this [2] thread for the word > "security". > The same thing applies here as well. The only key difference is user convenience in two ways: (a) when there are a lot of columns, say 100 columns, and user would like to send all data except 2 columns, (b) adding new columns to table would require users to again run the DDL to change the column list. These are primarily the two pain points YeXiu wants us to solve. YeXiu, if I misunderstood your intention, feel free to add. -- With Regards, Amit Kapila.
Amit Kapila, Yes, as you mentioned, but I’d like to add that when using the exclusion method for newly added columns, there’s no need to modify the publication. This is similar to how fields are automatically synchronized when columns are unspecified during initial setup. This is also a key reason why this approach is valuable.
原始邮件
发件人:Amit Kapila <amit.kapila16@gmail.com> 发件时间:2025年4月11日 12:00 收件人:Peter Smith <smithpb2250@gmail.com> 抄送:YeXiu <1518981153@qq.com>, pgsql-hackers <pgsql-hackers@lists.postgresql.org> 主题:Re: Feature Recommendations for Logical Subscriptions |
>
> FYI, the Column List documentation [1] says
> ------
> However, do not rely on this feature for security: a malicious
> subscriber is able to obtain data from columns that are not
> specifically published. If security is a consideration, protections
> can be applied at the publisher side.
> ------
>
> IIRC, this was something to do with how the COPY done by the initial
> table sync might be manipulated by a malicious subscriber. I think you
> can find more details about this in the original thread when Column
> Lists were introduced. e.g. try searching this [2] thread for the word
> "security".
>
The same thing applies here as well. The only key difference is user
convenience in two ways: (a) when there are a lot of columns, say 100
columns, and user would like to send all data except 2 columns, (b)
adding new columns to table would require users to again run the DDL
to change the column list.
These are primarily the two pain points YeXiu wants us to solve.
YeXiu, if I misunderstood your intention, feel free to add.
--
With Regards,
Amit Kapila.
Another permission-related issue involves scenarios where multiple logical replication slots exist. If a replication slot grants full data access permissions and user accounts are not explicitly bound to specific slots, there could be security risks where accounts might connect to high-privilege replication slots, potentially leading to data security vulnerabilities.
原始邮件
发件人:Amit Kapila <amit.kapila16@gmail.com> 发件时间:2025年4月11日 12:00 收件人:Peter Smith <smithpb2250@gmail.com> 抄送:YeXiu <1518981153@qq.com>, pgsql-hackers <pgsql-hackers@lists.postgresql.org> 主题:Re: Feature Recommendations for Logical Subscriptions |
>
> FYI, the Column List documentation [1] says
> ------
> However, do not rely on this feature for security: a malicious
> subscriber is able to obtain data from columns that are not
> specifically published. If security is a consideration, protections
> can be applied at the publisher side.
> ------
>
> IIRC, this was something to do with how the COPY done by the initial
> table sync might be manipulated by a malicious subscriber. I think you
> can find more details about this in the original thread when Column
> Lists were introduced. e.g. try searching this [2] thread for the word
> "security".
>
The same thing applies here as well. The only key difference is user
convenience in two ways: (a) when there are a lot of columns, say 100
columns, and user would like to send all data except 2 columns, (b)
adding new columns to table would require users to again run the DDL
to change the column list.
These are primarily the two pain points YeXiu wants us to solve.
YeXiu, if I misunderstood your intention, feel free to add.
--
With Regards,
Amit Kapila.