Thread: Commitfest 2021-11 Patch Triage - Part 2

Commitfest 2021-11 Patch Triage - Part 2

From
Daniel Gustafsson
Date:
Below is the second pass over the patches in the current commitfest, this time
the patches which have been around for 5 or 6 commitfests.  The usual
disclaimers that I very well might have missed something still apply.


2773: libpq compression
=======================
This patch intended to provide libpq connection compression to "replace SSL
compression" which was doomed when the patch was written, and have since been
removed altogether.  The initial approach didn't get much traction but there
was significant discussion and work, which has since fizzled out.  The patch
has been updated but there hasn't been meaningful review the past months, the
last comments seem to imply there being a fair amount of questionmarks left in
here.  Robert, having been very involved in this do you have any thoughts on
where we are and where to go (if at all IYO)?


2765: Add extra statistics to explain for Nested Loop
=====================================================
There has been almost exclusively positive feedback on this idea, and the patch
has matured over several rounds of fixes.  Any committer keen on picking this
up and getting it across the finishline?


2780: Allow COPY "text" to output a header and add header matching mode to COPY
FROM
===============================================================================
The original patch was rejected for lacking matching, but it has since been
addressed in an updated patch which has seen a lot of review.  Reading the
thread it seems this should really be in Ready for Committer.


2815: CREATE INDEX CONCURRENTLY on partitioned table
====================================================
This spawned off from the work on REINDEX CONCURRENTLY but seems to have
stalled and is awaiting more review.


2800: CLUSTER on partitioned table
==================================
The the patch above, it too spawned off from the REINDEX CONCURRENTLY thread
but seems to have stalled awaiting more review.  There was scepticism raised
over the usefulness vs code weight tradeoff, but the response to this haven't
really been followed up on.  Alvaro and Michael, do you have any changed
thoughts on Justins patch reorg or do you stil feel this isn't worth going
ahead with?


2897: Faster pglz compression
=============================
Tomas did benchmarking recently which shows promising speedups and is actively
working on getting it ready for commit.


2906: Row filtering for logical replication
===========================================
This patch is undergoing lots of active review.


2831: Extended statistics / estimate Var op Var clauses
=======================================================
The patch and approach has been approved by reviewers, it's down to the final
push over the finishline from what I can gather.  Seems to be awaiting
benchmarks by Tomas on the relative costs of the new operations in planning.


2918: New default role allowing to change per-role/database settings
====================================================================
Stephen, you had comments on the patch which have since been addressed.  Any
thoughts on the new revision?


2911: Dynamic result sets from procedures
=========================================
This patch is gated on the SHOW_ALL_RESULTS patch, and is thus not moving until
there is further progress on that one.


2901: SQL/JSON: functions
2901: SQL/JSON: JSON_TABLE
==========================
Grouping these two together as they are in very similar states.  The patches
seems to mostly move forward by rebases fixing smaller things.  Andrew, do you
have a sense of how far these are from a committable state?


2838: brin: avoid errors processing work item following concurrent reindex
==========================================================================
Alvaro proposed an alternative fix, and the thread has since stalled.  Should
this fix be applied and the CF entry closed?


2877: Fix firing of RI triggers during cross-partition updates of partitioned
tables
=============================================================================
This is fixing a live issue in the backbranches, and would be good to get
closure on.  Amit, you mentioned a problem with the patch involving deferrable
foreign constraints, is this all fixed in the latest posted revision?  (sorry
if that's spelled out and I missed it)


2900: On client connection event trigger
========================================
This implements a client login event trigger commonly used for auditing and
connection setup tasks.  I've reviewed this a fair bit and I think it's pretty
much committable, but there is an ongoing discussion on whether it needs an
escape-hatch GUC to disable such triggers in case they brick the database.  I
am against such a GUC but there has been a lot of support voiced in the thread
for it. If we can put that to rest I think we can close this patch.


2903: Parallel Hash Full Join
=============================
This patch is blocked on a reproducible crash which is being worked on.


2864: Fix pg_rewind race condition just after promotion
2869: Mitigate pg_rewind race condition, if config file is enlarged
concurrently.
===================================================================
Grouping similar entries.  Both fix live bugs (one reported in 2015), and both
patches are reviewed and approved since months ago.  Seems like we should be
able to close these ones in the CF.


2824: Functions for sorting GiST build of gist_btree types
==========================================================
This seems to get close to be committable after bug investigation by Heikki.


2837: pg_stat_statements and "IN" conditions
============================================
This patch aims to consolidate (based on a configurable threshold) SELECT ..
WHERE c IN (a, b, ...) queries to variations of the same query from cluttering
up pg_stat_statements.  Seems like a good idea, and the patch has been tested
and reviewed to the point where it seems ready for committer to me.


2839: possibility to rename root namespace in plpgsql
=====================================================
This patch addresses a feature request for making plpgsql easier to work with,
but the approach has been questioned with the thread stalling since March in
one of these discussion.  Is there interest in picking this up or should we
close it as Returned with Feedback?


2863: Printing backtrace of postgres processes
==============================================
This patch is being actively worked on and reviewed.


2851: Consider parallel for LATERAL subqueries having LIMIT/OFFSET
==================================================================
A nice little fix for supporting parallel query execution for LATERAL ..
LIMIT/OFFSET queries.  Recently marked ready for committer, so let's try and
get this in.


2861: pg_upgrade test for binary compatibility of core data types
=================================================================
This patch is also actively worked on and reviewed, and bits from this thread
have already landed.  Do you think this can land in its entirety in this CF
Michael?


--
Daniel Gustafsson        https://vmware.com/

--
Daniel Gustafsson        https://vmware.com/




Re: Commitfest 2021-11 Patch Triage - Part 2

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> 2773: libpq compression
> =======================
> This patch intended to provide libpq connection compression to "replace SSL
> compression" which was doomed when the patch was written, and have since been
> removed altogether.  The initial approach didn't get much traction but there
> was significant discussion and work, which has since fizzled out.  The patch
> has been updated but there hasn't been meaningful review the past months, the
> last comments seem to imply there being a fair amount of questionmarks left in
> here.  Robert, having been very involved in this do you have any thoughts on
> where we are and where to go (if at all IYO)?

I'm not Robert, but I still have an opinion here, and that it's that this
feature would at best be an attractive nuisance.  If you need compression
on a database session, it probably means that the connection is over the
open internet, which means that you need encryption even more.  And we
know that compression and encryption do not play well together.  The
reason compression was taken out of the latest TLS standards is not that
they wouldn't have liked to have it, nor that applying compression in a
separate code layer would be any safer.  I fear offering this would
merely lead people to build CVE-worthy setups.

            regards, tom lane



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Pavel Stehule
Date:
Hi

2839: possibility to rename root namespace in plpgsql
=====================================================
This patch addresses a feature request for making plpgsql easier to work with,
but the approach has been questioned with the thread stalling since March in
one of these discussion.  Is there interest in picking this up or should we
close it as Returned with Feedback?

This feature is from the category "nice to have, but we can live without it well". There are few requests per decade about this feature. I think my design is good enough, but surely it is not the only one.

We cannot use inspiration from Oracle's PL/SQL or from Ada, because this requires a  feature that is not supported in Postgres (pragmas). Without pragma is relatively hard to find good design. PL/pgSQL options are not native elements of language (I think so it is the best option), and other designs are not fully correct (although they can look more natural). I don't see any passion (negative or positive) about this feature, and I think this feature can be returned without feedback. Maybe somebody will come with other proposals, maybe we will have pragmas one day.

Much more important is extending the debug support of PL/pgSQL. This patch can simplify implementations of PL/pgSQL debuggers, and this feature can be a real benefit for PL/pgSQL developers.

Regards

Pavel



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
> > 2773: libpq compression
> > =======================
> > This patch intended to provide libpq connection compression to "replace SSL
> > compression" which was doomed when the patch was written, and have since been
> > removed altogether.  The initial approach didn't get much traction but there
> > was significant discussion and work, which has since fizzled out.  The patch
> > has been updated but there hasn't been meaningful review the past months, the
> > last comments seem to imply there being a fair amount of questionmarks left in
> > here.  Robert, having been very involved in this do you have any thoughts on
> > where we are and where to go (if at all IYO)?
>
> I'm not Robert, but I still have an opinion here, and that it's that this
> feature would at best be an attractive nuisance.  If you need compression
> on a database session, it probably means that the connection is over the
> open internet, which means that you need encryption even more.  And we
> know that compression and encryption do not play well together.  The
> reason compression was taken out of the latest TLS standards is not that
> they wouldn't have liked to have it, nor that applying compression in a
> separate code layer would be any safer.  I fear offering this would
> merely lead people to build CVE-worthy setups.

I've got an opinion on this also and I don't agree that needing
compression means you're on the open internet or that we shouldn't allow
users the choice to decide if they want to use compression and
encryption together or not.  Yes, there's potential risks there, but I
don't think those risks would lead to CVEs against PG for supporting a
mode where we allow compression and then also allow encryption- if
that's a problem then it's an issue for the encryption library/method
being used and isn't the fault of us for allowing that combination.

Further, the general issue with encryption+compression attacks is when
the attacker is able to inject chosen plaintext content into what ends
up being compressed and encrypted, and then is able to observe the
compressed+encrypted traffic, and that's not nearly as easy to do when
you're talking about a database connection vs. a case where an attacker
is able to convince a victim to go to a malicious website and through
that is able to do cross-site requests to inject the known plaintext
into requests to a trusted webpage and see the results coming back.

That we don't have any support for such a thing in the PG protocol and
the general case that app server to DB traffic can certainly be a lot
and mean that compression is worthwhile even on a private network, and
that users may wish to encrypt even on a private network, are all good
reasons why we should support both compression and encryption and allow
users to choose what makes sense for them in their environment.

Thanks,

Stephen

Attachment

RE: Commitfest 2021-11 Patch Triage - Part 2

From
Floris Van Nee
Date:
> Daniel Gustafsson <daniel@yesql.se> writes:
> > 2773: libpq compression
> > =======================
> > This patch intended to provide libpq connection compression to
> > "replace SSL compression" which was doomed when the patch was
> written,
> > and have since been removed altogether.  The initial approach didn't
> > get much traction but there was significant discussion and work, which
> > has since fizzled out.  The patch has been updated but there hasn't
> > been meaningful review the past months, the last comments seem to
> > imply there being a fair amount of questionmarks left in here.
> > Robert, having been very involved in this do you have any thoughts on
> where we are and where to go (if at all IYO)?
>
> I'm not Robert, but I still have an opinion here, and that it's that this feature
> would at best be an attractive nuisance.  If you need compression on a
> database session, it probably means that the connection is over the open
> internet, which means that you need encryption even more.  And we know
> that compression and encryption do not play well together.  The reason
> compression was taken out of the latest TLS standards is not that they
> wouldn't have liked to have it, nor that applying compression in a separate
> code layer would be any safer.  I fear offering this would merely lead people
> to build CVE-worthy setups.
>

I don't have a strong opinion on the patch but I'd just like to mention that there are definitely use cases for using
compressionon private networks (with ssl off). We have cases where we have a PG extension that hooks into the COPY ...
FROMcommand and accept lz4-compressed data for the COPY data message specifically. This has proven invaluable to us
evenon private networks, because it can give a good compression ratio for very little CPU decompression overhead. 



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Andrew Dunstan
Date:
On 11/9/21 09:12, Daniel Gustafsson wrote:
>
> 2901: SQL/JSON: functions
> 2901: SQL/JSON: JSON_TABLE
> ==========================
> Grouping these two together as they are in very similar states.  The patches
> seems to mostly move forward by rebases fixing smaller things.  Andrew, do you
> have a sense of how far these are from a committable state?
>
>

For various reasons I'm going to be hors de combat for most of the rest
of the year, back at the beginning of January, although I will be around
if necessary, and probably tinkering with one or two things. I think at
least the first set is not in bad shape, but it would benefit from some
fresh eyeballs. Tom Kincaid is trying to arrange some extra resources
for that, so we might well see some progress soon.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Commitfest 2021-11 Patch Triage - Part 2

From
Bruce Momjian
Date:
On Tue, Nov  9, 2021 at 12:43:20PM -0500, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Daniel Gustafsson <daniel@yesql.se> writes:
> > I'm not Robert, but I still have an opinion here, and that it's that this
> > feature would at best be an attractive nuisance.  If you need compression
> > on a database session, it probably means that the connection is over the
> > open internet, which means that you need encryption even more.  And we
> > know that compression and encryption do not play well together.  The
> > reason compression was taken out of the latest TLS standards is not that
> > they wouldn't have liked to have it, nor that applying compression in a
> > separate code layer would be any safer.  I fear offering this would
> > merely lead people to build CVE-worthy setups.
> 
> I've got an opinion on this also and I don't agree that needing
> compression means you're on the open internet or that we shouldn't allow
> users the choice to decide if they want to use compression and
> encryption together or not.  Yes, there's potential risks there, but I
> don't think those risks would lead to CVEs against PG for supporting a
> mode where we allow compression and then also allow encryption- if
> that's a problem then it's an issue for the encryption library/method
> being used and isn't the fault of us for allowing that combination.

Yeah, I am leaning this diretion too.  On the one hand, the number of
ways to exploit compression-then-encryption seem to continue to grow,
but it is the complex behavior of HTTPS and connecting to multiple sites
which seems to always be required, and we don't do that with the
Postgres client/server protocol.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Commitfest 2021-11 Patch Triage - Part 2

From
Michael Paquier
Date:
On Tue, Nov 09, 2021 at 03:12:39PM +0100, Daniel Gustafsson wrote:
> 2800: CLUSTER on partitioned table
> ==================================
> The the patch above, it too spawned off from the REINDEX CONCURRENTLY thread
> but seems to have stalled awaiting more review.  There was scepticism raised
> over the usefulness vs code weight tradeoff, but the response to this haven't
> really been followed up on.  Alvaro and Michael, do you have any changed
> thoughts on Justins patch reorg or do you stil feel this isn't worth going
> ahead with?

My feeling on this one is that the addition was not worth the
complexity, and I got the impression that the handling of
pg_index.indisclustered was rather blurry when it came to multiple
layers of partitions.

> 2861: pg_upgrade test for binary compatibility of core data types
> =================================================================
> This patch is also actively worked on and reviewed, and bits from this thread
> have already landed.  Do you think this can land in its entirety in this CF
> Michael?

The remaining parts are the addition of a SQL test, which requires
more comments and documentation as the thing is complex to follow, and
the switch of the upgrade SQL queries to a file.  I need to do a pass
over it, but that seems worth having on its own.  For the second
point, I wanted to double-check first with Andrew about how much code
we could cut in the buildfarm client code.

Thanks for the summary!
--
Michael

Attachment

Re: Commitfest 2021-11 Patch Triage - Part 2

From
Andrey Borodin
Date:

> Daniel Gustafsson <daniel@yesql.se> writes:
> 
>> 2773: libpq compression
>> =======================
>> This patch intended to provide libpq connection compression to "replace SSL
>> compression" which was doomed when the patch was written, and have since been
>> removed altogether. The initial approach didn't get much traction but there
>> was significant discussion and work, which has since fizzled out. The patch
>> has been updated but there hasn't been meaningful review the past months, the
>> last comments seem to imply there being a fair amount of questionmarks left in
>> here. Robert, having been very involved in this do you have any thoughts on
>> where we are and where to go (if at all IYO)?
> 
> I'm not Robert, but I still have an opinion here, and that it's that this
> feature would at best be an attractive nuisance. If you need compression
> on a database session, it probably means that the connection is over the
> open internet, which means that you need encryption even more. And we
> know that compression and encryption do not play well together. The
> reason compression was taken out of the latest TLS standards is not that
> they wouldn't have liked to have it, nor that applying compression in a
> separate code layer would be any safer. I fear offering this would
> merely lead people to build CVE-worthy setups.
> 

Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth
limits.
An entropy added by WAL headers makes CRIME attack against replication encryption impractical.

Best regards, Andrey Borodin.



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Daniel Gustafsson
Date:
> On 9 Nov 2021, at 18:43, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> 2773: libpq compression
>>> =======================
>>> This patch intended to provide libpq connection compression to "replace SSL
>>> compression" which was doomed when the patch was written, and have since been
>>> removed altogether.  The initial approach didn't get much traction but there
>>> was significant discussion and work, which has since fizzled out.  The patch
>>> has been updated but there hasn't been meaningful review the past months, the
>>> last comments seem to imply there being a fair amount of questionmarks left in
>>> here.  Robert, having been very involved in this do you have any thoughts on
>>> where we are and where to go (if at all IYO)?
>>
>> I'm not Robert, but I still have an opinion here, and that it's that this
>> feature would at best be an attractive nuisance.  If you need compression
>> on a database session, it probably means that the connection is over the
>> open internet, which means that you need encryption even more.  And we
>> know that compression and encryption do not play well together.  The
>> reason compression was taken out of the latest TLS standards is not that
>> they wouldn't have liked to have it, nor that applying compression in a
>> separate code layer would be any safer.  I fear offering this would
>> merely lead people to build CVE-worthy setups.
>
> I've got an opinion on this also and I don't agree that needing
> compression means you're on the open internet or that we shouldn't allow
> users the choice to decide if they want to use compression and
> encryption together or not.  Yes, there's potential risks there, but I
> don't think those risks would lead to CVEs against PG for supporting a
> mode where we allow compression and then also allow encryption- if
> that's a problem then it's an issue for the encryption library/method
> being used and isn't the fault of us for allowing that combination.

I'm not so sure, if the parts aren't vulnerable separately on their own but the
combination of them + libpq allows for weakening encryption in a TLS stream I'd
say the CVE is on us.

> Further, the general issue with encryption+compression attacks is when
> the attacker is able to inject chosen plaintext content into what ends
> up being compressed and encrypted, and then is able to observe the
> compressed+encrypted traffic, and that's not nearly as easy to do when
> you're talking about a database connection vs. a case where an attacker
> is able to convince a victim to go to a malicious website and through
> that is able to do cross-site requests to inject the known plaintext
> into requests to a trusted webpage and see the results coming back.

I agree with this, the known weaknesses rely on mechanisms that aren't readily
applicable to postgres connections.  I don't remember if the patch already does
this, but if not we should ensure we only negotiate compression after
authentication like how OpenSSH does [0].

--
Daniel Gustafsson        https://vmware.com/

[0] https://www.openssh.com/txt/draft-miller-secsh-compression-delayed-00.txt


Re: Commitfest 2021-11 Patch Triage - Part 2

From
Nikolay Samokhvalov
Date:
On Tue, Nov 9, 2021 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
> 2773: libpq compression
> =======================
> This patch intended to provide libpq connection compression to "replace SSL
> compression" which was doomed when the patch was written, and have since been
> removed altogether.  The initial approach didn't get much traction but there
> was significant discussion and work, which has since fizzled out.  The patch
> has been updated but there hasn't been meaningful review the past months, the
> last comments seem to imply there being a fair amount of questionmarks left in
> here.  Robert, having been very involved in this do you have any thoughts on
> where we are and where to go (if at all IYO)?

I'm not Robert, but I still have an opinion here, and that it's that this
feature would at best be an attractive nuisance.  If you need compression
on a database session, it probably means that the connection is over the
open internet, which means that you need encryption even more. 

This assumption is very vague. I personally had multiple cases when network bandwidth for app <--> Postgres communication, that were fixed wither via upgrades (spending more money) or making application developers cache more data (optimization), or, usually, both. Never public internet connections were involved.

Actually, any growing startup experiences such issues sooner or later. Network compression would be a great tool for many setups, and some other popular database systems offer it for a long.

Re: Commitfest 2021-11 Patch Triage - Part 2

From
Tomas Vondra
Date:

On 11/10/21 00:21, Bruce Momjian wrote:
> On Tue, Nov  9, 2021 at 12:43:20PM -0500, Stephen Frost wrote:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> I'm not Robert, but I still have an opinion here, and that it's that this
>>> feature would at best be an attractive nuisance.  If you need compression
>>> on a database session, it probably means that the connection is over the
>>> open internet, which means that you need encryption even more.  And we
>>> know that compression and encryption do not play well together.  The
>>> reason compression was taken out of the latest TLS standards is not that
>>> they wouldn't have liked to have it, nor that applying compression in a
>>> separate code layer would be any safer.  I fear offering this would
>>> merely lead people to build CVE-worthy setups.
>>
>> I've got an opinion on this also and I don't agree that needing
>> compression means you're on the open internet or that we shouldn't allow
>> users the choice to decide if they want to use compression and
>> encryption together or not.  Yes, there's potential risks there, but I
>> don't think those risks would lead to CVEs against PG for supporting a
>> mode where we allow compression and then also allow encryption- if
>> that's a problem then it's an issue for the encryption library/method
>> being used and isn't the fault of us for allowing that combination.
> 
> Yeah, I am leaning this diretion too.  On the one hand, the number of
> ways to exploit compression-then-encryption seem to continue to grow,
> but it is the complex behavior of HTTPS and connecting to multiple sites
> which seems to always be required, and we don't do that with the
> Postgres client/server protocol.
> 

+0.5

I don't think the number of attacks grows all that much, actually. Or 
more precisely, the new vulnerabilities are just variations on the CRIME 
pattern adopting it to different contexts.

For example Voracle [1] applies to openvpn connections, but in principle 
it's the same scenario - observing changes to the compressed size, after 
tweaking the requests.

[1] https://community.openvpn.net/openvpn/wiki/VORACLE

So it's true compression is a side-channel, but these attacks require 
other ingredients too - ability to MITM the connection, and ability to 
inject custom data.

Doing MITM is probably harder than when attacking HTTPS, because DB 
connections tend to be between app server and db server - so setting up 
a rogue AP does not really work. Sometimes people / DBAs may connect to 
the DB directly, although in a way that makes it hard to do modify the 
data (which is why psql/ssh are immune to this). But it's possible.

But how hard is injecting custom data in a way that'd make CRIME-like 
issue possible? Yes, you could hijack webapp similarly to CRIME, in a 
way that initiates database requests with custom data. But we don't 
generally send both the supplied and actual password to the DB to do the 
check there - we just send one of them, and check it against the value 
stored in the DB.

The one (somewhat) plausible scenario I can think of is an application doing

    UPDATE t SET field = user_data WHERE id = 1;
    SELECT field FROM t WHERE ...;

in which case we might learn if some other row in the result contains 
the same value. Not sure how practical this could be, though.

AFAIK OpenVPN still supports tunnel compression, but it's disabled by 
default. That probably says something about what they think about the 
feature, although that might be due to having compression and not 
wanting to just remove it.

One interesting idea OpenVPN allows is asymmetric compression, i.e. 
allowing compression in just one direction. For example it might be 
possible for the client to send compressed data, but the server would 
never send compressed. Or vice versa. That might still help with e.g. 
large imports or exports, depending on the use case, while maybe 
mitigating some issues. But it's clearly not a solid defense.


regards


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Tomas Vondra
Date:
On 11/10/21 16:54, Andrey Borodin wrote:
> 
> 
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>
>>> 2773: libpq compression
>>> =======================
>>> This patch intended to provide libpq connection compression to "replace SSL
>>> compression" which was doomed when the patch was written, and have since been
>>> removed altogether. The initial approach didn't get much traction but there
>>> was significant discussion and work, which has since fizzled out. The patch
>>> has been updated but there hasn't been meaningful review the past months, the
>>> last comments seem to imply there being a fair amount of questionmarks left in
>>> here. Robert, having been very involved in this do you have any thoughts on
>>> where we are and where to go (if at all IYO)?
>>
>> I'm not Robert, but I still have an opinion here, and that it's that this
>> feature would at best be an attractive nuisance. If you need compression
>> on a database session, it probably means that the connection is over the
>> open internet, which means that you need encryption even more. And we
>> know that compression and encryption do not play well together. The
>> reason compression was taken out of the latest TLS standards is not that
>> they wouldn't have liked to have it, nor that applying compression in a
>> separate code layer would be any safer. I fear offering this would
>> merely lead people to build CVE-worthy setups.
>>
> 
> Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth
limits.
> An entropy added by WAL headers makes CRIME attack against replication encryption impractical.

I very much doubt WAL headers are a reliable protection against CRIME, 
because the entropy of the headers is likely fairly constant. So if you 
compress the WAL stream, the WAL headers may change but the compression 
ratio should be pretty similar. At least that's my guess.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Andrey Borodin
Date:

> On 11/10/21 16:54, Andrey Borodin wrote:
> 
>> Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth
limits.
>> An entropy added by WAL headers makes CRIME attack against replication encryption impractical.
> 
> I very much doubt WAL headers are a reliable protection against CRIME,
> because the entropy of the headers is likely fairly constant. So if you
> compress the WAL stream, the WAL headers may change but the compression
> ratio should be pretty similar. At least that's my guess.

I've thought more about it and I agree.
To reliably protect against CRIME entropy of WAL headers must be comparable with the entropy of possibly injected
data.
If this would stand, probably, our WAL would need a really serious rework.

Maybe just refuse to enable compression on SSL connection? If someone really needs both - they will just patch a server
ontheir own.
 
Or make a GUC "yes_i_kwow_what_crime_is_give_grant_read_on_my_data_to_spies".

Best regards, Andrey Borodin.



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Stephen Frost
Date:
Greetings,

* Andrey Borodin (x4mmm@yandex-team.ru) wrote:
> > On 11/10/21 16:54, Andrey Borodin wrote:
> >> Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth
limits.
> >> An entropy added by WAL headers makes CRIME attack against replication encryption impractical.
> >
> > I very much doubt WAL headers are a reliable protection against CRIME,
> > because the entropy of the headers is likely fairly constant. So if you
> > compress the WAL stream, the WAL headers may change but the compression
> > ratio should be pretty similar. At least that's my guess.
>
> I've thought more about it and I agree.
> To reliably protect against CRIME entropy of WAL headers must be comparable with the entropy of possibly injected
data.
> If this would stand, probably, our WAL would need a really serious rework.
>
> Maybe just refuse to enable compression on SSL connection? If someone really needs both - they will just patch a
serveron their own. 
> Or make a GUC "yes_i_kwow_what_crime_is_give_grant_read_on_my_data_to_spies".

Attackers aren't likely to have the kind of isolated control over the
data in the WAL stream (which is a combination of data from lots of
ongoing activity in the system and isn't likely to be exactly what the
attacker supplied at some higher level anyway) and the ability to read
and analyze the WAL stream from a primary to a replica to be able to
effectively attack it.

None of this kind of babysitting makes any sense to me.

I'm certainly fine with documenting that there are cases where
compression of data sent by an attacker and then sent over an encrypted
channel has been able to break the encryption and let admins decide if
it's appropriate or not for them to use in their environment.

Thanks,

Stephen

Attachment

Re: Commitfest 2021-11 Patch Triage - Part 2

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Attackers aren't likely to have the kind of isolated control over the
> data in the WAL stream (which is a combination of data from lots of
> ongoing activity in the system and isn't likely to be exactly what the
> attacker supplied at some higher level anyway) and the ability to read
> and analyze the WAL stream from a primary to a replica to be able to
> effectively attack it.

Yeah, I concur with that so far as WAL data goes.  A hypothetical attacker
will not have control over xact IDs, tuple TIDs, etc, which will add
enough entropy to the stream that extracting data payloads seems pretty
infeasible.

My concern upthread was about client-session connections, where such
mitigation doesn't apply.  (I wonder a bit about logical-replication
streams, too.)

            regards, tom lane



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Robert Haas
Date:
On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> 2773: libpq compression
> =======================
> This patch intended to provide libpq connection compression to "replace SSL
> compression" which was doomed when the patch was written, and have since been
> removed altogether.  The initial approach didn't get much traction but there
> was significant discussion and work, which has since fizzled out.  The patch
> has been updated but there hasn't been meaningful review the past months, the
> last comments seem to imply there being a fair amount of questionmarks left in
> here.  Robert, having been very involved in this do you have any thoughts on
> where we are and where to go (if at all IYO)?

It's been a while since I looked at that patch, and my recollections
are somewhat fuzzy. However, I think that my main feeling was that the
quality of the patch didn't seem to be that close to what I felt would
be required for a commit. I think at that point there were still
significant questions about the details of the approach as well as
code-quality and performance concerns. It may have gotten better since
last I looked.

I do agree with the comments from others that there is room to debate
(1) the usefulness of this and (2) the degree to which it might leave
us on the hook for security problems that wouldn't be our
responsibility if we left this up to OpenSSL (or maybe some other SSL
library, if we ever get pluggability there, which I hope we do). But
my concerns were more pragmatic than theoretical.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Daniel Gustafsson
Date:
> On 15 Nov 2021, at 15:32, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>> 2773: libpq compression
>> =======================
>> This patch intended to provide libpq connection compression to "replace SSL
>> compression" which was doomed when the patch was written, and have since been
>> removed altogether.  The initial approach didn't get much traction but there
>> was significant discussion and work, which has since fizzled out.  The patch
>> has been updated but there hasn't been meaningful review the past months, the
>> last comments seem to imply there being a fair amount of questionmarks left in
>> here.  Robert, having been very involved in this do you have any thoughts on
>> where we are and where to go (if at all IYO)?
>
> It's been a while since I looked at that patch, and my recollections
> are somewhat fuzzy.

No worries, thanks for sharing!

> However, I think that my main feeling was that the
> quality of the patch didn't seem to be that close to what I felt would
> be required for a commit. I think at that point there were still
> significant questions about the details of the approach as well as
> code-quality and performance concerns. It may have gotten better since
> last I looked.

Context switching into this thread is hard, so it's not clear how much the
patch has moved since you reviewed, or how the current patchset relates to the
earlier one as there seems to have been a shift at a recent(ish) point in time.
The last message (which is unreadable HTML soup in the archives) talks about
implementing lz4 with "Looking forward to post an update soon."

Given the discussions had and the doubts over approach I wonder if the best
course of action with the CF entry in question is to close it, hash out the
what and how on -hackers and reopen a new entry when there is a patch which.
We are not doing ourselves, or the CF process, any favors by moving an entry
along which isn't all that close to committable.

Thoughts (addressing everyone involved/interested)?

> I do agree with the comments from others that there is room to debate
> (1) the usefulness of this and (2) the degree to which it might leave
> us on the hook for security problems that wouldn't be our
> responsibility if we left this up to OpenSSL (or maybe some other SSL
> library, if we ever get pluggability there, which I hope we do). But
> my concerns were more pragmatic than theoretical.

I share you hope here =) However I don't think we'll get much help with
compression from the TLS libraries.  OpenSSL has deprecated transport
compression, NSS never implemented it in the first place, and TLS 1.3 removes
it altogether.  That being said, this is mainly due to protocols where chosen
plaintext attacks are practical/possible.  For our case we, IMHO, should look
more at how OpenSSH and other protocols closer to ours has built compression in
a safe way, to see if we can achieve the level of transport safety we require.

But, there is I think broad concensus that we at least must fully understand
the tradeoffs should we allow compression.

--
Daniel Gustafsson        https://vmware.com/




Re: Commitfest 2021-11 Patch Triage - Part 2

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 15 Nov 2021, at 15:32, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Nov 9, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> >> 2773: libpq compression
> >> =======================
> >> This patch intended to provide libpq connection compression to "replace SSL
> >> compression" which was doomed when the patch was written, and have since been
> >> removed altogether.  The initial approach didn't get much traction but there
> >> was significant discussion and work, which has since fizzled out.  The patch
> >> has been updated but there hasn't been meaningful review the past months, the
> >> last comments seem to imply there being a fair amount of questionmarks left in
> >> here.  Robert, having been very involved in this do you have any thoughts on
> >> where we are and where to go (if at all IYO)?
> >
> > It's been a while since I looked at that patch, and my recollections
> > are somewhat fuzzy.
>
> No worries, thanks for sharing!
>
> > However, I think that my main feeling was that the
> > quality of the patch didn't seem to be that close to what I felt would
> > be required for a commit. I think at that point there were still
> > significant questions about the details of the approach as well as
> > code-quality and performance concerns. It may have gotten better since
> > last I looked.
>
> Context switching into this thread is hard, so it's not clear how much the
> patch has moved since you reviewed, or how the current patchset relates to the
> earlier one as there seems to have been a shift at a recent(ish) point in time.
> The last message (which is unreadable HTML soup in the archives) talks about
> implementing lz4 with "Looking forward to post an update soon."

Not sure where the best place to drop this comment is, but the patch
seems to be doing exactly what I was hoping it didn't- it's just taking
the ultimate results and compressing them and sending them on to the
client, at least that's what it looks like in a quick look.

What about when that data has already been compressed and stored
compressed and not manipulated at all?  The initial post in that thread
talks about returning data from 'large JSON columns'.  With this
approach, we would likely end up taking compressed json/jsonb data,
decompressing it, and then re-compressing it to send to the client.

That seems ... less than ideal.

I get that just compressing the entire stream is simpler and easier and
such, but it's surely cheaper and more efficient to not decompress and
then recompress data that's already compressed.  Finding a way to pass
through data that's already compressed when stored as-is while also
supporting compression of everything else (in a sensible way- wouldn't
make sense to just compress each attribute independently since a 4 byte
integer isn't going to get smaller with compression) definitely
complicates the overall idea but perhaps would be possible to do.

> Given the discussions had and the doubts over approach I wonder if the best
> course of action with the CF entry in question is to close it, hash out the
> what and how on -hackers and reopen a new entry when there is a patch which.
> We are not doing ourselves, or the CF process, any favors by moving an entry
> along which isn't all that close to committable.
>
> Thoughts (addressing everyone involved/interested)?

Yeah, agreed with this in general.

Thanks,

Stephen

Attachment

Re: Commitfest 2021-11 Patch Triage - Part 2

From
Robert Haas
Date:
On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote:
> I get that just compressing the entire stream is simpler and easier and
> such, but it's surely cheaper and more efficient to not decompress and
> then recompress data that's already compressed.  Finding a way to pass
> through data that's already compressed when stored as-is while also
> supporting compression of everything else (in a sensible way- wouldn't
> make sense to just compress each attribute independently since a 4 byte
> integer isn't going to get smaller with compression) definitely
> complicates the overall idea but perhaps would be possible to do.

To me, this feels like an attempt to move the goalposts far enough to
kill the project. Sure, in a perfect world, that would be nice. But,
we don't do it anywhere else. If you try to store a JPEG into a bytea
column, we'll try to compress it just like we would any other data,
and it may not work out. If you then take a pg_basebackup of the
database using -Z, there's no attempt made to avoid the overhead of
CPU overhead of compressing those TOAST table pages that contain
already-compressed data and not the others. And it's easy to
understand why that's the case: when you insert data into the
database, there's no way for the database to magically know whether
that data has been previously compressed by some means, and if so, how
effectively. And when you back up a database, the backup doesn't know
which relfilenodes contain TOAST tables or which pages of those
relfilenodes contain that is already pre-compressed. In both cases,
your options are either (1) shut off compression yourself or (2) hope
that the compressor doesn't waste too much effort on it.

I think the same approach ought to be completely acceptable here. I
don't even really understand how we could do anything else. printtup()
just gets datums, and it has no idea whether or how they are toasted.
It calls the type output functions which don't know that data is being
prepared for transmission to the client as opposed to some other
hypothetical way you could call that function, nor do they know what
compression method the client wants. It does not seem at all
straightforward to teach them that ... and even if they did, what
then? It's not like every column value is sent as a separate packet;
the whole row is a single protocol message, and some columns may be
compressed and others uncompressed. Trying to guess what to do about
that seems to boil down to a sheer guess. Unless you try to compress
that mixture of compressed and uncompressed values - and it's
moderately uncommon for every column of a table to be even be
toastable - you aren't going to know how well it will compress. You
could easily waste more CPU cycles trying to guess than you would have
spent just doing what the user asked for.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Commitfest 2021-11 Patch Triage - Part 2

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I get that just compressing the entire stream is simpler and easier and
> > such, but it's surely cheaper and more efficient to not decompress and
> > then recompress data that's already compressed.  Finding a way to pass
> > through data that's already compressed when stored as-is while also
> > supporting compression of everything else (in a sensible way- wouldn't
> > make sense to just compress each attribute independently since a 4 byte
> > integer isn't going to get smaller with compression) definitely
> > complicates the overall idea but perhaps would be possible to do.
>
> To me, this feels like an attempt to move the goalposts far enough to
> kill the project. Sure, in a perfect world, that would be nice. But,
> we don't do it anywhere else. If you try to store a JPEG into a bytea
> column, we'll try to compress it just like we would any other data,
> and it may not work out. If you then take a pg_basebackup of the
> database using -Z, there's no attempt made to avoid the overhead of
> CPU overhead of compressing those TOAST table pages that contain
> already-compressed data and not the others. And it's easy to
> understand why that's the case: when you insert data into the
> database, there's no way for the database to magically know whether
> that data has been previously compressed by some means, and if so, how
> effectively. And when you back up a database, the backup doesn't know
> which relfilenodes contain TOAST tables or which pages of those
> relfilenodes contain that is already pre-compressed. In both cases,
> your options are either (1) shut off compression yourself or (2) hope
> that the compressor doesn't waste too much effort on it.

I'll grant that perhaps it's just a different project.

While, sure, we will try to compress things we don't understand by
default, the admin does have ways to tell us to not do that and so it
isn't always just guess-work.  As for other parts of the system not
being smarter- they should be.  I'd argue that up until quite recently
it didn't make sense to teach something like pg_basebackup about TOAST
tables because our compression was rather lacking and it was actually
useful to compress TOAST tables (perhaps still is with the compression
methods we have now, but we'll hopefully add more).  We should probably
look at trying to provide a way for pg_basebackup and other backup tools
to identify TOAST tables to avoid trying to compress them as it's just
wasted effort (though in that case, it's at least less wasted than here-
we don't decompress that TOAST data and then re-compress it).

Using bytea for everything-and-the-kitchen-sink also just generally
seems like it's throwing away more information than we really should (if
we had a jpeg data type, for example, we could just have that default to
not being compressed and not spend time trying to compress jpegs).

I also just generally disagree that an inefficiency in one part of the
system justifies having it somewhere else.

> I think the same approach ought to be completely acceptable here. I
> don't even really understand how we could do anything else. printtup()
> just gets datums, and it has no idea whether or how they are toasted.
> It calls the type output functions which don't know that data is being
> prepared for transmission to the client as opposed to some other
> hypothetical way you could call that function, nor do they know what
> compression method the client wants. It does not seem at all
> straightforward to teach them that ... and even if they did, what
> then? It's not like every column value is sent as a separate packet;
> the whole row is a single protocol message, and some columns may be
> compressed and others uncompressed. Trying to guess what to do about
> that seems to boil down to a sheer guess. Unless you try to compress
> that mixture of compressed and uncompressed values - and it's
> moderately uncommon for every column of a table to be even be
> toastable - you aren't going to know how well it will compress. You
> could easily waste more CPU cycles trying to guess than you would have
> spent just doing what the user asked for.

I agree that we wouldn't want to be doing this all based on guesswork
somehow.  Having a way to identify what's compressed and what isn't
would be needed, as well as what method was used to compress.  It's not
like we don't know that at the source, we'd just need a way to pass that
back through- probably more-or-less exactly as how it's stored today.

Tackling the actual protocol side of it seems like it'd be the more
sensible place to start working through a design and I don't think
that'd be trivial to do, so probably not something to try and hash out
in this particular thread.  I'm happy to have at least voiced that
thought.

Thanks,

Stephen

Attachment

Re: Commitfest 2021-11 Patch Triage - Part 2

From
Daniil Zakhlystov
Date:
Hi! It’s been a while since the original patch release. Let me provide a brief
overview of the current patch status.

The initial approach was to use the streaming compression to compress all
outgoing and decompress all incoming bytes. However, after the long discussion
in the thread, the initial approach has been changed.

The current implementation allows compressing only specific message types,
use the different compression algorithms for different message types, configure
the allowed compression methods and levels both for server- and client- sides
via GUC setting / connection string respectively.

Also, current implementation combines (when possible) multiple protocol messages
into the single CompressedData message for a better compression ratio.

>
> On 16 Nov 2021, at 01:23, Robert Haas <robertmhaas@gmail.com> wrote:
>
> To me, this feels like an attempt to move the goalposts far enough to
> kill the project. Sure, in a perfect world, that would be nice. But,
> we don't do it anywhere else. If you try to store a JPEG into a bytea
> column, we'll try to compress it just like we would any other data,
> and it may not work out. If you then take a pg_basebackup of the
> database using -Z, there's no attempt made to avoid the overhead of
> CPU overhead of compressing those TOAST table pages that contain
> already-compressed data and not the others. And it's easy to
> understand why that's the case: when you insert data into the
> database, there's no way for the database to magically know whether
> that data has been previously compressed by some means, and if so, how
> effectively. And when you back up a database, the backup doesn't know
> which relfilenodes contain TOAST tables or which pages of those
> relfilenodes contain that is already pre-compressed. In both cases,
> your options are either (1) shut off compression yourself or (2) hope
> that the compressor doesn't waste too much effort on it.
>
> I think the same approach ought to be completely acceptable here. I
> don't even really understand how we could do anything else. printtup()
> just gets datums, and it has no idea whether or how they are toasted.
> It calls the type output functions which don't know that data is being
> prepared for transmission to the client as opposed to some other
> hypothetical way you could call that function, nor do they know what
> compression method the client wants. It does not seem at all
> straightforward to teach them that ... and even if they did, what
> then? It's not like every column value is sent as a separate packet;
> the whole row is a single protocol message, and some columns may be
> compressed and others uncompressed. Trying to guess what to do about
> that seems to boil down to a sheer guess. Unless you try to compress
> that mixture of compressed and uncompressed values - and it's
> moderately uncommon for every column of a table to be even be
> toastable - you aren't going to know how well it will compress. You
> could easily waste more CPU cycles trying to guess than you would have
> spent just doing what the user asked for.
>

Agree. From my POV, it is OK to use the protocol message type and length to decide
should it be compressed or not. Also, this can be optimized later without the need to change
the protocol.

Regarding the LZ4 support patch, it still has some minor polishing to do. Basically, it only adds the LZ4
algorithm support and does not change anything fundamentally. So I would appreciate someone doing
a review of the current patch version.

The original thread is quite huge so I guess that it makes it hard to catch up with the current patch status.
I can make a new one with a detailed summary if that would help.

Thanks,

Daniil Zakhlystov




Re: Commitfest 2021-11 Patch Triage - Part 2

From
Rémi Lapeyre
Date:
>
> 2780: Allow COPY "text" to output a header and add header matching mode to COPY
> FROM
> ===============================================================================
> The original patch was rejected for lacking matching, but it has since been
> addressed in an updated patch which has seen a lot of review.  Reading the
> thread it seems this should really be in Ready for Committer.

I don’t think there is anything more that needs to be done for this patch. I have been checking that it apply cleanly
andthat the tests still pass but it has already multiple rounds of reviews and is probably ready. 

Best regards,
Rémi