Thread: Commitfest 2021-11 Patch Triage - Part 2
Below is the second pass over the patches in the current commitfest, this time the patches which have been around for 5 or 6 commitfests. The usual disclaimers that I very well might have missed something still apply. 2773: libpq compression ======================= This patch intended to provide libpq connection compression to "replace SSL compression" which was doomed when the patch was written, and have since been removed altogether. The initial approach didn't get much traction but there was significant discussion and work, which has since fizzled out. The patch has been updated but there hasn't been meaningful review the past months, the last comments seem to imply there being a fair amount of questionmarks left in here. Robert, having been very involved in this do you have any thoughts on where we are and where to go (if at all IYO)? 2765: Add extra statistics to explain for Nested Loop ===================================================== There has been almost exclusively positive feedback on this idea, and the patch has matured over several rounds of fixes. Any committer keen on picking this up and getting it across the finishline? 2780: Allow COPY "text" to output a header and add header matching mode to COPY FROM =============================================================================== The original patch was rejected for lacking matching, but it has since been addressed in an updated patch which has seen a lot of review. Reading the thread it seems this should really be in Ready for Committer. 2815: CREATE INDEX CONCURRENTLY on partitioned table ==================================================== This spawned off from the work on REINDEX CONCURRENTLY but seems to have stalled and is awaiting more review. 2800: CLUSTER on partitioned table ================================== The the patch above, it too spawned off from the REINDEX CONCURRENTLY thread but seems to have stalled awaiting more review. There was scepticism raised over the usefulness vs code weight tradeoff, but the response to this haven't really been followed up on. Alvaro and Michael, do you have any changed thoughts on Justins patch reorg or do you stil feel this isn't worth going ahead with? 2897: Faster pglz compression ============================= Tomas did benchmarking recently which shows promising speedups and is actively working on getting it ready for commit. 2906: Row filtering for logical replication =========================================== This patch is undergoing lots of active review. 2831: Extended statistics / estimate Var op Var clauses ======================================================= The patch and approach has been approved by reviewers, it's down to the final push over the finishline from what I can gather. Seems to be awaiting benchmarks by Tomas on the relative costs of the new operations in planning. 2918: New default role allowing to change per-role/database settings ==================================================================== Stephen, you had comments on the patch which have since been addressed. Any thoughts on the new revision? 2911: Dynamic result sets from procedures ========================================= This patch is gated on the SHOW_ALL_RESULTS patch, and is thus not moving until there is further progress on that one. 2901: SQL/JSON: functions 2901: SQL/JSON: JSON_TABLE ========================== Grouping these two together as they are in very similar states. The patches seems to mostly move forward by rebases fixing smaller things. Andrew, do you have a sense of how far these are from a committable state? 2838: brin: avoid errors processing work item following concurrent reindex ========================================================================== Alvaro proposed an alternative fix, and the thread has since stalled. Should this fix be applied and the CF entry closed? 2877: Fix firing of RI triggers during cross-partition updates of partitioned tables ============================================================================= This is fixing a live issue in the backbranches, and would be good to get closure on. Amit, you mentioned a problem with the patch involving deferrable foreign constraints, is this all fixed in the latest posted revision? (sorry if that's spelled out and I missed it) 2900: On client connection event trigger ======================================== This implements a client login event trigger commonly used for auditing and connection setup tasks. I've reviewed this a fair bit and I think it's pretty much committable, but there is an ongoing discussion on whether it needs an escape-hatch GUC to disable such triggers in case they brick the database. I am against such a GUC but there has been a lot of support voiced in the thread for it. If we can put that to rest I think we can close this patch. 2903: Parallel Hash Full Join ============================= This patch is blocked on a reproducible crash which is being worked on. 2864: Fix pg_rewind race condition just after promotion 2869: Mitigate pg_rewind race condition, if config file is enlarged concurrently. =================================================================== Grouping similar entries. Both fix live bugs (one reported in 2015), and both patches are reviewed and approved since months ago. Seems like we should be able to close these ones in the CF. 2824: Functions for sorting GiST build of gist_btree types ========================================================== This seems to get close to be committable after bug investigation by Heikki. 2837: pg_stat_statements and "IN" conditions ============================================ This patch aims to consolidate (based on a configurable threshold) SELECT .. WHERE c IN (a, b, ...) queries to variations of the same query from cluttering up pg_stat_statements. Seems like a good idea, and the patch has been tested and reviewed to the point where it seems ready for committer to me. 2839: possibility to rename root namespace in plpgsql ===================================================== This patch addresses a feature request for making plpgsql easier to work with, but the approach has been questioned with the thread stalling since March in one of these discussion. Is there interest in picking this up or should we close it as Returned with Feedback? 2863: Printing backtrace of postgres processes ============================================== This patch is being actively worked on and reviewed. 2851: Consider parallel for LATERAL subqueries having LIMIT/OFFSET ================================================================== A nice little fix for supporting parallel query execution for LATERAL .. LIMIT/OFFSET queries. Recently marked ready for committer, so let's try and get this in. 2861: pg_upgrade test for binary compatibility of core data types ================================================================= This patch is also actively worked on and reviewed, and bits from this thread have already landed. Do you think this can land in its entirety in this CF Michael? -- Daniel Gustafsson https://vmware.com/ -- Daniel Gustafsson https://vmware.com/
Daniel Gustafsson <daniel@yesql.se> writes: > 2773: libpq compression > ======================= > This patch intended to provide libpq connection compression to "replace SSL > compression" which was doomed when the patch was written, and have since been > removed altogether. The initial approach didn't get much traction but there > was significant discussion and work, which has since fizzled out. The patch > has been updated but there hasn't been meaningful review the past months, the > last comments seem to imply there being a fair amount of questionmarks left in > here. Robert, having been very involved in this do you have any thoughts on > where we are and where to go (if at all IYO)? I'm not Robert, but I still have an opinion here, and that it's that this feature would at best be an attractive nuisance. If you need compression on a database session, it probably means that the connection is over the open internet, which means that you need encryption even more. And we know that compression and encryption do not play well together. The reason compression was taken out of the latest TLS standards is not that they wouldn't have liked to have it, nor that applying compression in a separate code layer would be any safer. I fear offering this would merely lead people to build CVE-worthy setups. regards, tom lane
Hi
2839: possibility to rename root namespace in plpgsql
=====================================================
This patch addresses a feature request for making plpgsql easier to work with,
but the approach has been questioned with the thread stalling since March in
one of these discussion. Is there interest in picking this up or should we
close it as Returned with Feedback?
This feature is from the category "nice to have, but we can live without it well". There are few requests per decade about this feature. I think my design is good enough, but surely it is not the only one.
We cannot use inspiration from Oracle's PL/SQL or from Ada, because this requires a feature that is not supported in Postgres (pragmas). Without pragma is relatively hard to find good design. PL/pgSQL options are not native elements of language (I think so it is the best option), and other designs are not fully correct (although they can look more natural). I don't see any passion (negative or positive) about this feature, and I think this feature can be returned without feedback. Maybe somebody will come with other proposals, maybe we will have pragmas one day.
Much more important is extending the debug support of PL/pgSQL. This patch can simplify implementations of PL/pgSQL debuggers, and this feature can be a real benefit for PL/pgSQL developers.
Regards
Pavel
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Daniel Gustafsson <daniel@yesql.se> writes: > > 2773: libpq compression > > ======================= > > This patch intended to provide libpq connection compression to "replace SSL > > compression" which was doomed when the patch was written, and have since been > > removed altogether. The initial approach didn't get much traction but there > > was significant discussion and work, which has since fizzled out. The patch > > has been updated but there hasn't been meaningful review the past months, the > > last comments seem to imply there being a fair amount of questionmarks left in > > here. Robert, having been very involved in this do you have any thoughts on > > where we are and where to go (if at all IYO)? > > I'm not Robert, but I still have an opinion here, and that it's that this > feature would at best be an attractive nuisance. If you need compression > on a database session, it probably means that the connection is over the > open internet, which means that you need encryption even more. And we > know that compression and encryption do not play well together. The > reason compression was taken out of the latest TLS standards is not that > they wouldn't have liked to have it, nor that applying compression in a > separate code layer would be any safer. I fear offering this would > merely lead people to build CVE-worthy setups. I've got an opinion on this also and I don't agree that needing compression means you're on the open internet or that we shouldn't allow users the choice to decide if they want to use compression and encryption together or not. Yes, there's potential risks there, but I don't think those risks would lead to CVEs against PG for supporting a mode where we allow compression and then also allow encryption- if that's a problem then it's an issue for the encryption library/method being used and isn't the fault of us for allowing that combination. Further, the general issue with encryption+compression attacks is when the attacker is able to inject chosen plaintext content into what ends up being compressed and encrypted, and then is able to observe the compressed+encrypted traffic, and that's not nearly as easy to do when you're talking about a database connection vs. a case where an attacker is able to convince a victim to go to a malicious website and through that is able to do cross-site requests to inject the known plaintext into requests to a trusted webpage and see the results coming back. That we don't have any support for such a thing in the PG protocol and the general case that app server to DB traffic can certainly be a lot and mean that compression is worthwhile even on a private network, and that users may wish to encrypt even on a private network, are all good reasons why we should support both compression and encryption and allow users to choose what makes sense for them in their environment. Thanks, Stephen
Attachment
> Daniel Gustafsson <daniel@yesql.se> writes: > > 2773: libpq compression > > ======================= > > This patch intended to provide libpq connection compression to > > "replace SSL compression" which was doomed when the patch was > written, > > and have since been removed altogether. The initial approach didn't > > get much traction but there was significant discussion and work, which > > has since fizzled out. The patch has been updated but there hasn't > > been meaningful review the past months, the last comments seem to > > imply there being a fair amount of questionmarks left in here. > > Robert, having been very involved in this do you have any thoughts on > where we are and where to go (if at all IYO)? > > I'm not Robert, but I still have an opinion here, and that it's that this feature > would at best be an attractive nuisance. If you need compression on a > database session, it probably means that the connection is over the open > internet, which means that you need encryption even more. And we know > that compression and encryption do not play well together. The reason > compression was taken out of the latest TLS standards is not that they > wouldn't have liked to have it, nor that applying compression in a separate > code layer would be any safer. I fear offering this would merely lead people > to build CVE-worthy setups. > I don't have a strong opinion on the patch but I'd just like to mention that there are definitely use cases for using compressionon private networks (with ssl off). We have cases where we have a PG extension that hooks into the COPY ... FROMcommand and accept lz4-compressed data for the COPY data message specifically. This has proven invaluable to us evenon private networks, because it can give a good compression ratio for very little CPU decompression overhead.
On 11/9/21 09:12, Daniel Gustafsson wrote: > > 2901: SQL/JSON: functions > 2901: SQL/JSON: JSON_TABLE > ========================== > Grouping these two together as they are in very similar states. The patches > seems to mostly move forward by rebases fixing smaller things. Andrew, do you > have a sense of how far these are from a committable state? > > For various reasons I'm going to be hors de combat for most of the rest of the year, back at the beginning of January, although I will be around if necessary, and probably tinkering with one or two things. I think at least the first set is not in bad shape, but it would benefit from some fresh eyeballs. Tom Kincaid is trying to arrange some extra resources for that, so we might well see some progress soon. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Nov 9, 2021 at 12:43:20PM -0500, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: > > I'm not Robert, but I still have an opinion here, and that it's that this > > feature would at best be an attractive nuisance. If you need compression > > on a database session, it probably means that the connection is over the > > open internet, which means that you need encryption even more. And we > > know that compression and encryption do not play well together. The > > reason compression was taken out of the latest TLS standards is not that > > they wouldn't have liked to have it, nor that applying compression in a > > separate code layer would be any safer. I fear offering this would > > merely lead people to build CVE-worthy setups. > > I've got an opinion on this also and I don't agree that needing > compression means you're on the open internet or that we shouldn't allow > users the choice to decide if they want to use compression and > encryption together or not. Yes, there's potential risks there, but I > don't think those risks would lead to CVEs against PG for supporting a > mode where we allow compression and then also allow encryption- if > that's a problem then it's an issue for the encryption library/method > being used and isn't the fault of us for allowing that combination. Yeah, I am leaning this diretion too. On the one hand, the number of ways to exploit compression-then-encryption seem to continue to grow, but it is the complex behavior of HTTPS and connecting to multiple sites which seems to always be required, and we don't do that with the Postgres client/server protocol. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Nov 09, 2021 at 03:12:39PM +0100, Daniel Gustafsson wrote: > 2800: CLUSTER on partitioned table > ================================== > The the patch above, it too spawned off from the REINDEX CONCURRENTLY thread > but seems to have stalled awaiting more review. There was scepticism raised > over the usefulness vs code weight tradeoff, but the response to this haven't > really been followed up on. Alvaro and Michael, do you have any changed > thoughts on Justins patch reorg or do you stil feel this isn't worth going > ahead with? My feeling on this one is that the addition was not worth the complexity, and I got the impression that the handling of pg_index.indisclustered was rather blurry when it came to multiple layers of partitions. > 2861: pg_upgrade test for binary compatibility of core data types > ================================================================= > This patch is also actively worked on and reviewed, and bits from this thread > have already landed. Do you think this can land in its entirety in this CF > Michael? The remaining parts are the addition of a SQL test, which requires more comments and documentation as the thing is complex to follow, and the switch of the upgrade SQL queries to a file. I need to do a pass over it, but that seems worth having on its own. For the second point, I wanted to double-check first with Andrew about how much code we could cut in the buildfarm client code. Thanks for the summary! -- Michael
Attachment
> Daniel Gustafsson <daniel@yesql.se> writes: > >> 2773: libpq compression >> ======================= >> This patch intended to provide libpq connection compression to "replace SSL >> compression" which was doomed when the patch was written, and have since been >> removed altogether. The initial approach didn't get much traction but there >> was significant discussion and work, which has since fizzled out. The patch >> has been updated but there hasn't been meaningful review the past months, the >> last comments seem to imply there being a fair amount of questionmarks left in >> here. Robert, having been very involved in this do you have any thoughts on >> where we are and where to go (if at all IYO)? > > I'm not Robert, but I still have an opinion here, and that it's that this > feature would at best be an attractive nuisance. If you need compression > on a database session, it probably means that the connection is over the > open internet, which means that you need encryption even more. And we > know that compression and encryption do not play well together. The > reason compression was taken out of the latest TLS standards is not that > they wouldn't have liked to have it, nor that applying compression in a > separate code layer would be any safer. I fear offering this would > merely lead people to build CVE-worthy setups. > Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth limits. An entropy added by WAL headers makes CRIME attack against replication encryption impractical. Best regards, Andrey Borodin.
> On 9 Nov 2021, at 18:43, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Daniel Gustafsson <daniel@yesql.se> writes: >>> 2773: libpq compression >>> ======================= >>> This patch intended to provide libpq connection compression to "replace SSL >>> compression" which was doomed when the patch was written, and have since been >>> removed altogether. The initial approach didn't get much traction but there >>> was significant discussion and work, which has since fizzled out. The patch >>> has been updated but there hasn't been meaningful review the past months, the >>> last comments seem to imply there being a fair amount of questionmarks left in >>> here. Robert, having been very involved in this do you have any thoughts on >>> where we are and where to go (if at all IYO)? >> >> I'm not Robert, but I still have an opinion here, and that it's that this >> feature would at best be an attractive nuisance. If you need compression >> on a database session, it probably means that the connection is over the >> open internet, which means that you need encryption even more. And we >> know that compression and encryption do not play well together. The >> reason compression was taken out of the latest TLS standards is not that >> they wouldn't have liked to have it, nor that applying compression in a >> separate code layer would be any safer. I fear offering this would >> merely lead people to build CVE-worthy setups. > > I've got an opinion on this also and I don't agree that needing > compression means you're on the open internet or that we shouldn't allow > users the choice to decide if they want to use compression and > encryption together or not. Yes, there's potential risks there, but I > don't think those risks would lead to CVEs against PG for supporting a > mode where we allow compression and then also allow encryption- if > that's a problem then it's an issue for the encryption library/method > being used and isn't the fault of us for allowing that combination. I'm not so sure, if the parts aren't vulnerable separately on their own but the combination of them + libpq allows for weakening encryption in a TLS stream I'd say the CVE is on us. > Further, the general issue with encryption+compression attacks is when > the attacker is able to inject chosen plaintext content into what ends > up being compressed and encrypted, and then is able to observe the > compressed+encrypted traffic, and that's not nearly as easy to do when > you're talking about a database connection vs. a case where an attacker > is able to convince a victim to go to a malicious website and through > that is able to do cross-site requests to inject the known plaintext > into requests to a trusted webpage and see the results coming back. I agree with this, the known weaknesses rely on mechanisms that aren't readily applicable to postgres connections. I don't remember if the patch already does this, but if not we should ensure we only negotiate compression after authentication like how OpenSSH does [0]. -- Daniel Gustafsson https://vmware.com/ [0] https://www.openssh.com/txt/draft-miller-secsh-compression-delayed-00.txt
On Tue, Nov 9, 2021 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
> 2773: libpq compression
> =======================
> This patch intended to provide libpq connection compression to "replace SSL
> compression" which was doomed when the patch was written, and have since been
> removed altogether. The initial approach didn't get much traction but there
> was significant discussion and work, which has since fizzled out. The patch
> has been updated but there hasn't been meaningful review the past months, the
> last comments seem to imply there being a fair amount of questionmarks left in
> here. Robert, having been very involved in this do you have any thoughts on
> where we are and where to go (if at all IYO)?
I'm not Robert, but I still have an opinion here, and that it's that this
feature would at best be an attractive nuisance. If you need compression
on a database session, it probably means that the connection is over the
open internet, which means that you need encryption even more.
This assumption is very vague. I personally had multiple cases when network bandwidth for app <--> Postgres communication, that were fixed wither via upgrades (spending more money) or making application developers cache more data (optimization), or, usually, both. Never public internet connections were involved.
Actually, any growing startup experiences such issues sooner or later. Network compression would be a great tool for many setups, and some other popular database systems offer it for a long.
Actually, any growing startup experiences such issues sooner or later. Network compression would be a great tool for many setups, and some other popular database systems offer it for a long.
On 11/10/21 00:21, Bruce Momjian wrote: > On Tue, Nov 9, 2021 at 12:43:20PM -0500, Stephen Frost wrote: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> Daniel Gustafsson <daniel@yesql.se> writes: >>> I'm not Robert, but I still have an opinion here, and that it's that this >>> feature would at best be an attractive nuisance. If you need compression >>> on a database session, it probably means that the connection is over the >>> open internet, which means that you need encryption even more. And we >>> know that compression and encryption do not play well together. The >>> reason compression was taken out of the latest TLS standards is not that >>> they wouldn't have liked to have it, nor that applying compression in a >>> separate code layer would be any safer. I fear offering this would >>> merely lead people to build CVE-worthy setups. >> >> I've got an opinion on this also and I don't agree that needing >> compression means you're on the open internet or that we shouldn't allow >> users the choice to decide if they want to use compression and >> encryption together or not. Yes, there's potential risks there, but I >> don't think those risks would lead to CVEs against PG for supporting a >> mode where we allow compression and then also allow encryption- if >> that's a problem then it's an issue for the encryption library/method >> being used and isn't the fault of us for allowing that combination. > > Yeah, I am leaning this diretion too. On the one hand, the number of > ways to exploit compression-then-encryption seem to continue to grow, > but it is the complex behavior of HTTPS and connecting to multiple sites > which seems to always be required, and we don't do that with the > Postgres client/server protocol. > +0.5 I don't think the number of attacks grows all that much, actually. Or more precisely, the new vulnerabilities are just variations on the CRIME pattern adopting it to different contexts. For example Voracle [1] applies to openvpn connections, but in principle it's the same scenario - observing changes to the compressed size, after tweaking the requests. [1] https://community.openvpn.net/openvpn/wiki/VORACLE So it's true compression is a side-channel, but these attacks require other ingredients too - ability to MITM the connection, and ability to inject custom data. Doing MITM is probably harder than when attacking HTTPS, because DB connections tend to be between app server and db server - so setting up a rogue AP does not really work. Sometimes people / DBAs may connect to the DB directly, although in a way that makes it hard to do modify the data (which is why psql/ssh are immune to this). But it's possible. But how hard is injecting custom data in a way that'd make CRIME-like issue possible? Yes, you could hijack webapp similarly to CRIME, in a way that initiates database requests with custom data. But we don't generally send both the supplied and actual password to the DB to do the check there - we just send one of them, and check it against the value stored in the DB. The one (somewhat) plausible scenario I can think of is an application doing UPDATE t SET field = user_data WHERE id = 1; SELECT field FROM t WHERE ...; in which case we might learn if some other row in the result contains the same value. Not sure how practical this could be, though. AFAIK OpenVPN still supports tunnel compression, but it's disabled by default. That probably says something about what they think about the feature, although that might be due to having compression and not wanting to just remove it. One interesting idea OpenVPN allows is asymmetric compression, i.e. allowing compression in just one direction. For example it might be possible for the client to send compressed data, but the server would never send compressed. Or vice versa. That might still help with e.g. large imports or exports, depending on the use case, while maybe mitigating some issues. But it's clearly not a solid defense. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/10/21 16:54, Andrey Borodin wrote: > > >> Daniel Gustafsson <daniel@yesql.se> writes: >> >>> 2773: libpq compression >>> ======================= >>> This patch intended to provide libpq connection compression to "replace SSL >>> compression" which was doomed when the patch was written, and have since been >>> removed altogether. The initial approach didn't get much traction but there >>> was significant discussion and work, which has since fizzled out. The patch >>> has been updated but there hasn't been meaningful review the past months, the >>> last comments seem to imply there being a fair amount of questionmarks left in >>> here. Robert, having been very involved in this do you have any thoughts on >>> where we are and where to go (if at all IYO)? >> >> I'm not Robert, but I still have an opinion here, and that it's that this >> feature would at best be an attractive nuisance. If you need compression >> on a database session, it probably means that the connection is over the >> open internet, which means that you need encryption even more. And we >> know that compression and encryption do not play well together. The >> reason compression was taken out of the latest TLS standards is not that >> they wouldn't have liked to have it, nor that applying compression in a >> separate code layer would be any safer. I fear offering this would >> merely lead people to build CVE-worthy setups. >> > > Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth limits. > An entropy added by WAL headers makes CRIME attack against replication encryption impractical. I very much doubt WAL headers are a reliable protection against CRIME, because the entropy of the headers is likely fairly constant. So if you compress the WAL stream, the WAL headers may change but the compression ratio should be pretty similar. At least that's my guess. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 11/10/21 16:54, Andrey Borodin wrote: > >> Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth limits. >> An entropy added by WAL headers makes CRIME attack against replication encryption impractical. > > I very much doubt WAL headers are a reliable protection against CRIME, > because the entropy of the headers is likely fairly constant. So if you > compress the WAL stream, the WAL headers may change but the compression > ratio should be pretty similar. At least that's my guess. I've thought more about it and I agree. To reliably protect against CRIME entropy of WAL headers must be comparable with the entropy of possibly injected data. If this would stand, probably, our WAL would need a really serious rework. Maybe just refuse to enable compression on SSL connection? If someone really needs both - they will just patch a server ontheir own. Or make a GUC "yes_i_kwow_what_crime_is_give_grant_read_on_my_data_to_spies". Best regards, Andrey Borodin.
Greetings, * Andrey Borodin (x4mmm@yandex-team.ru) wrote: > > On 11/10/21 16:54, Andrey Borodin wrote: > >> Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth limits. > >> An entropy added by WAL headers makes CRIME attack against replication encryption impractical. > > > > I very much doubt WAL headers are a reliable protection against CRIME, > > because the entropy of the headers is likely fairly constant. So if you > > compress the WAL stream, the WAL headers may change but the compression > > ratio should be pretty similar. At least that's my guess. > > I've thought more about it and I agree. > To reliably protect against CRIME entropy of WAL headers must be comparable with the entropy of possibly injected data. > If this would stand, probably, our WAL would need a really serious rework. > > Maybe just refuse to enable compression on SSL connection? If someone really needs both - they will just patch a serveron their own. > Or make a GUC "yes_i_kwow_what_crime_is_give_grant_read_on_my_data_to_spies". Attackers aren't likely to have the kind of isolated control over the data in the WAL stream (which is a combination of data from lots of ongoing activity in the system and isn't likely to be exactly what the attacker supplied at some higher level anyway) and the ability to read and analyze the WAL stream from a primary to a replica to be able to effectively attack it. None of this kind of babysitting makes any sense to me. I'm certainly fine with documenting that there are cases where compression of data sent by an attacker and then sent over an encrypted channel has been able to break the encryption and let admins decide if it's appropriate or not for them to use in their environment. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > Attackers aren't likely to have the kind of isolated control over the > data in the WAL stream (which is a combination of data from lots of > ongoing activity in the system and isn't likely to be exactly what the > attacker supplied at some higher level anyway) and the ability to read > and analyze the WAL stream from a primary to a replica to be able to > effectively attack it. Yeah, I concur with that so far as WAL data goes. A hypothetical attacker will not have control over xact IDs, tuple TIDs, etc, which will add enough entropy to the stream that extracting data payloads seems pretty infeasible. My concern upthread was about client-session connections, where such mitigation doesn't apply. (I wonder a bit about logical-replication streams, too.) regards, tom lane
On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote: > 2773: libpq compression > ======================= > This patch intended to provide libpq connection compression to "replace SSL > compression" which was doomed when the patch was written, and have since been > removed altogether. The initial approach didn't get much traction but there > was significant discussion and work, which has since fizzled out. The patch > has been updated but there hasn't been meaningful review the past months, the > last comments seem to imply there being a fair amount of questionmarks left in > here. Robert, having been very involved in this do you have any thoughts on > where we are and where to go (if at all IYO)? It's been a while since I looked at that patch, and my recollections are somewhat fuzzy. However, I think that my main feeling was that the quality of the patch didn't seem to be that close to what I felt would be required for a commit. I think at that point there were still significant questions about the details of the approach as well as code-quality and performance concerns. It may have gotten better since last I looked. I do agree with the comments from others that there is room to debate (1) the usefulness of this and (2) the degree to which it might leave us on the hook for security problems that wouldn't be our responsibility if we left this up to OpenSSL (or maybe some other SSL library, if we ever get pluggability there, which I hope we do). But my concerns were more pragmatic than theoretical. -- Robert Haas EDB: http://www.enterprisedb.com
> On 15 Nov 2021, at 15:32, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote: >> 2773: libpq compression >> ======================= >> This patch intended to provide libpq connection compression to "replace SSL >> compression" which was doomed when the patch was written, and have since been >> removed altogether. The initial approach didn't get much traction but there >> was significant discussion and work, which has since fizzled out. The patch >> has been updated but there hasn't been meaningful review the past months, the >> last comments seem to imply there being a fair amount of questionmarks left in >> here. Robert, having been very involved in this do you have any thoughts on >> where we are and where to go (if at all IYO)? > > It's been a while since I looked at that patch, and my recollections > are somewhat fuzzy. No worries, thanks for sharing! > However, I think that my main feeling was that the > quality of the patch didn't seem to be that close to what I felt would > be required for a commit. I think at that point there were still > significant questions about the details of the approach as well as > code-quality and performance concerns. It may have gotten better since > last I looked. Context switching into this thread is hard, so it's not clear how much the patch has moved since you reviewed, or how the current patchset relates to the earlier one as there seems to have been a shift at a recent(ish) point in time. The last message (which is unreadable HTML soup in the archives) talks about implementing lz4 with "Looking forward to post an update soon." Given the discussions had and the doubts over approach I wonder if the best course of action with the CF entry in question is to close it, hash out the what and how on -hackers and reopen a new entry when there is a patch which. We are not doing ourselves, or the CF process, any favors by moving an entry along which isn't all that close to committable. Thoughts (addressing everyone involved/interested)? > I do agree with the comments from others that there is room to debate > (1) the usefulness of this and (2) the degree to which it might leave > us on the hook for security problems that wouldn't be our > responsibility if we left this up to OpenSSL (or maybe some other SSL > library, if we ever get pluggability there, which I hope we do). But > my concerns were more pragmatic than theoretical. I share you hope here =) However I don't think we'll get much help with compression from the TLS libraries. OpenSSL has deprecated transport compression, NSS never implemented it in the first place, and TLS 1.3 removes it altogether. That being said, this is mainly due to protocols where chosen plaintext attacks are practical/possible. For our case we, IMHO, should look more at how OpenSSH and other protocols closer to ours has built compression in a safe way, to see if we can achieve the level of transport safety we require. But, there is I think broad concensus that we at least must fully understand the tradeoffs should we allow compression. -- Daniel Gustafsson https://vmware.com/
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 15 Nov 2021, at 15:32, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote: > >> 2773: libpq compression > >> ======================= > >> This patch intended to provide libpq connection compression to "replace SSL > >> compression" which was doomed when the patch was written, and have since been > >> removed altogether. The initial approach didn't get much traction but there > >> was significant discussion and work, which has since fizzled out. The patch > >> has been updated but there hasn't been meaningful review the past months, the > >> last comments seem to imply there being a fair amount of questionmarks left in > >> here. Robert, having been very involved in this do you have any thoughts on > >> where we are and where to go (if at all IYO)? > > > > It's been a while since I looked at that patch, and my recollections > > are somewhat fuzzy. > > No worries, thanks for sharing! > > > However, I think that my main feeling was that the > > quality of the patch didn't seem to be that close to what I felt would > > be required for a commit. I think at that point there were still > > significant questions about the details of the approach as well as > > code-quality and performance concerns. It may have gotten better since > > last I looked. > > Context switching into this thread is hard, so it's not clear how much the > patch has moved since you reviewed, or how the current patchset relates to the > earlier one as there seems to have been a shift at a recent(ish) point in time. > The last message (which is unreadable HTML soup in the archives) talks about > implementing lz4 with "Looking forward to post an update soon." Not sure where the best place to drop this comment is, but the patch seems to be doing exactly what I was hoping it didn't- it's just taking the ultimate results and compressing them and sending them on to the client, at least that's what it looks like in a quick look. What about when that data has already been compressed and stored compressed and not manipulated at all? The initial post in that thread talks about returning data from 'large JSON columns'. With this approach, we would likely end up taking compressed json/jsonb data, decompressing it, and then re-compressing it to send to the client. That seems ... less than ideal. I get that just compressing the entire stream is simpler and easier and such, but it's surely cheaper and more efficient to not decompress and then recompress data that's already compressed. Finding a way to pass through data that's already compressed when stored as-is while also supporting compression of everything else (in a sensible way- wouldn't make sense to just compress each attribute independently since a 4 byte integer isn't going to get smaller with compression) definitely complicates the overall idea but perhaps would be possible to do. > Given the discussions had and the doubts over approach I wonder if the best > course of action with the CF entry in question is to close it, hash out the > what and how on -hackers and reopen a new entry when there is a patch which. > We are not doing ourselves, or the CF process, any favors by moving an entry > along which isn't all that close to committable. > > Thoughts (addressing everyone involved/interested)? Yeah, agreed with this in general. Thanks, Stephen
Attachment
On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote: > I get that just compressing the entire stream is simpler and easier and > such, but it's surely cheaper and more efficient to not decompress and > then recompress data that's already compressed. Finding a way to pass > through data that's already compressed when stored as-is while also > supporting compression of everything else (in a sensible way- wouldn't > make sense to just compress each attribute independently since a 4 byte > integer isn't going to get smaller with compression) definitely > complicates the overall idea but perhaps would be possible to do. To me, this feels like an attempt to move the goalposts far enough to kill the project. Sure, in a perfect world, that would be nice. But, we don't do it anywhere else. If you try to store a JPEG into a bytea column, we'll try to compress it just like we would any other data, and it may not work out. If you then take a pg_basebackup of the database using -Z, there's no attempt made to avoid the overhead of CPU overhead of compressing those TOAST table pages that contain already-compressed data and not the others. And it's easy to understand why that's the case: when you insert data into the database, there's no way for the database to magically know whether that data has been previously compressed by some means, and if so, how effectively. And when you back up a database, the backup doesn't know which relfilenodes contain TOAST tables or which pages of those relfilenodes contain that is already pre-compressed. In both cases, your options are either (1) shut off compression yourself or (2) hope that the compressor doesn't waste too much effort on it. I think the same approach ought to be completely acceptable here. I don't even really understand how we could do anything else. printtup() just gets datums, and it has no idea whether or how they are toasted. It calls the type output functions which don't know that data is being prepared for transmission to the client as opposed to some other hypothetical way you could call that function, nor do they know what compression method the client wants. It does not seem at all straightforward to teach them that ... and even if they did, what then? It's not like every column value is sent as a separate packet; the whole row is a single protocol message, and some columns may be compressed and others uncompressed. Trying to guess what to do about that seems to boil down to a sheer guess. Unless you try to compress that mixture of compressed and uncompressed values - and it's moderately uncommon for every column of a table to be even be toastable - you aren't going to know how well it will compress. You could easily waste more CPU cycles trying to guess than you would have spent just doing what the user asked for. -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote: > > I get that just compressing the entire stream is simpler and easier and > > such, but it's surely cheaper and more efficient to not decompress and > > then recompress data that's already compressed. Finding a way to pass > > through data that's already compressed when stored as-is while also > > supporting compression of everything else (in a sensible way- wouldn't > > make sense to just compress each attribute independently since a 4 byte > > integer isn't going to get smaller with compression) definitely > > complicates the overall idea but perhaps would be possible to do. > > To me, this feels like an attempt to move the goalposts far enough to > kill the project. Sure, in a perfect world, that would be nice. But, > we don't do it anywhere else. If you try to store a JPEG into a bytea > column, we'll try to compress it just like we would any other data, > and it may not work out. If you then take a pg_basebackup of the > database using -Z, there's no attempt made to avoid the overhead of > CPU overhead of compressing those TOAST table pages that contain > already-compressed data and not the others. And it's easy to > understand why that's the case: when you insert data into the > database, there's no way for the database to magically know whether > that data has been previously compressed by some means, and if so, how > effectively. And when you back up a database, the backup doesn't know > which relfilenodes contain TOAST tables or which pages of those > relfilenodes contain that is already pre-compressed. In both cases, > your options are either (1) shut off compression yourself or (2) hope > that the compressor doesn't waste too much effort on it. I'll grant that perhaps it's just a different project. While, sure, we will try to compress things we don't understand by default, the admin does have ways to tell us to not do that and so it isn't always just guess-work. As for other parts of the system not being smarter- they should be. I'd argue that up until quite recently it didn't make sense to teach something like pg_basebackup about TOAST tables because our compression was rather lacking and it was actually useful to compress TOAST tables (perhaps still is with the compression methods we have now, but we'll hopefully add more). We should probably look at trying to provide a way for pg_basebackup and other backup tools to identify TOAST tables to avoid trying to compress them as it's just wasted effort (though in that case, it's at least less wasted than here- we don't decompress that TOAST data and then re-compress it). Using bytea for everything-and-the-kitchen-sink also just generally seems like it's throwing away more information than we really should (if we had a jpeg data type, for example, we could just have that default to not being compressed and not spend time trying to compress jpegs). I also just generally disagree that an inefficiency in one part of the system justifies having it somewhere else. > I think the same approach ought to be completely acceptable here. I > don't even really understand how we could do anything else. printtup() > just gets datums, and it has no idea whether or how they are toasted. > It calls the type output functions which don't know that data is being > prepared for transmission to the client as opposed to some other > hypothetical way you could call that function, nor do they know what > compression method the client wants. It does not seem at all > straightforward to teach them that ... and even if they did, what > then? It's not like every column value is sent as a separate packet; > the whole row is a single protocol message, and some columns may be > compressed and others uncompressed. Trying to guess what to do about > that seems to boil down to a sheer guess. Unless you try to compress > that mixture of compressed and uncompressed values - and it's > moderately uncommon for every column of a table to be even be > toastable - you aren't going to know how well it will compress. You > could easily waste more CPU cycles trying to guess than you would have > spent just doing what the user asked for. I agree that we wouldn't want to be doing this all based on guesswork somehow. Having a way to identify what's compressed and what isn't would be needed, as well as what method was used to compress. It's not like we don't know that at the source, we'd just need a way to pass that back through- probably more-or-less exactly as how it's stored today. Tackling the actual protocol side of it seems like it'd be the more sensible place to start working through a design and I don't think that'd be trivial to do, so probably not something to try and hash out in this particular thread. I'm happy to have at least voiced that thought. Thanks, Stephen
Attachment
Hi! It’s been a while since the original patch release. Let me provide a brief overview of the current patch status. The initial approach was to use the streaming compression to compress all outgoing and decompress all incoming bytes. However, after the long discussion in the thread, the initial approach has been changed. The current implementation allows compressing only specific message types, use the different compression algorithms for different message types, configure the allowed compression methods and levels both for server- and client- sides via GUC setting / connection string respectively. Also, current implementation combines (when possible) multiple protocol messages into the single CompressedData message for a better compression ratio. > > On 16 Nov 2021, at 01:23, Robert Haas <robertmhaas@gmail.com> wrote: > > To me, this feels like an attempt to move the goalposts far enough to > kill the project. Sure, in a perfect world, that would be nice. But, > we don't do it anywhere else. If you try to store a JPEG into a bytea > column, we'll try to compress it just like we would any other data, > and it may not work out. If you then take a pg_basebackup of the > database using -Z, there's no attempt made to avoid the overhead of > CPU overhead of compressing those TOAST table pages that contain > already-compressed data and not the others. And it's easy to > understand why that's the case: when you insert data into the > database, there's no way for the database to magically know whether > that data has been previously compressed by some means, and if so, how > effectively. And when you back up a database, the backup doesn't know > which relfilenodes contain TOAST tables or which pages of those > relfilenodes contain that is already pre-compressed. In both cases, > your options are either (1) shut off compression yourself or (2) hope > that the compressor doesn't waste too much effort on it. > > I think the same approach ought to be completely acceptable here. I > don't even really understand how we could do anything else. printtup() > just gets datums, and it has no idea whether or how they are toasted. > It calls the type output functions which don't know that data is being > prepared for transmission to the client as opposed to some other > hypothetical way you could call that function, nor do they know what > compression method the client wants. It does not seem at all > straightforward to teach them that ... and even if they did, what > then? It's not like every column value is sent as a separate packet; > the whole row is a single protocol message, and some columns may be > compressed and others uncompressed. Trying to guess what to do about > that seems to boil down to a sheer guess. Unless you try to compress > that mixture of compressed and uncompressed values - and it's > moderately uncommon for every column of a table to be even be > toastable - you aren't going to know how well it will compress. You > could easily waste more CPU cycles trying to guess than you would have > spent just doing what the user asked for. > Agree. From my POV, it is OK to use the protocol message type and length to decide should it be compressed or not. Also, this can be optimized later without the need to change the protocol. Regarding the LZ4 support patch, it still has some minor polishing to do. Basically, it only adds the LZ4 algorithm support and does not change anything fundamentally. So I would appreciate someone doing a review of the current patch version. The original thread is quite huge so I guess that it makes it hard to catch up with the current patch status. I can make a new one with a detailed summary if that would help. Thanks, Daniil Zakhlystov
> > 2780: Allow COPY "text" to output a header and add header matching mode to COPY > FROM > =============================================================================== > The original patch was rejected for lacking matching, but it has since been > addressed in an updated patch which has seen a lot of review. Reading the > thread it seems this should really be in Ready for Committer. I don’t think there is anything more that needs to be done for this patch. I have been checking that it apply cleanly andthat the tests still pass but it has already multiple rounds of reviews and is probably ready. Best regards, Rémi