Re: libpq compression (part 3) - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: libpq compression (part 3)
Date
Msg-id CAGECzQTMEry3HOqn_ajOZU4BnwAV1L67iSL9+2NQpwyFxm12tA@mail.gmail.com
Whole thread Raw
In response to Re: libpq compression (part 3)  (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses Re: libpq compression (part 3)
List pgsql-hackers
On Mon, 20 May 2024 at 21:42, Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.


To help get everyone on the same page I wanted to list all the
security concerns in one place:

1. Triggering excessive CPU usage before authentication, by asking for
very high compression levels
2. Triggering excessive memory/CPU usage before authentication, by
sending a client sending a zipbomb
3. Triggering excessive CPU after authentication, by asking for a very
high compression level
4. Triggering excessive memory/CPU after authentication due to
zipbombs (i.e. small amount of data extracting to lots of data)
5. CRIME style leakage of information about encrypted data

1 & 2 can easily be solved by not allowing any authentication packets
to be compressed. This also has benefits for 5.

3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
deserves some level of trust. But having knobs to limit impact
definitely seems useful.

3 can be solved in two ways afaict:
a. Allow the server to choose the maximum compression level for each
compression method (using some GUC), and downgrade the level
transparently when a higher level is requested
b. Don't allow the client to choose the compression level that the server uses.

I'd prefer option a

4 would require some safety limits on the amount of data that a
(small) compressed message can be decompressed to, and stop
decompression of that message once that limit is hit. What that limit
should be seems hard to choose though. A few ideas:
a. The size of the message reported by the uncompressed header. This
would mean that at most the 4GB will be uncompressed, since maximum
message length is 4GB (limited by 32bit message length field)
b. Allow servers to specify maximum client decompressed message length
lower than this 4GB, e.g. messages of more than 100MB of uncompressed
size should not be allowed.

I think 5 is the most complicated to deal with, especially as it
depends on the actual usage to know what is safe. I believe we should
let users have the freedom to make their own security tradeoffs, but
we should protect them against some of the most glaring issues
(especially ones that benefit little from compression anyway). As
already shown by Andrey, sending LDAP passwords in a compressed way
seems extremely dangerous. So I think we should disallow compressing
any authentication related packets. To reduce similar risks further we
can choose to compress only the message types that we expect to
benefit most from compression. IMHO those are the following (marked
with (B)ackend or (F)rontend to show who sends them):
- Query (F)
- Parse (F)
- Describe (F)
- Bind (F)
- RowDescription (B)
- DataRow (B)
- CopyData (B/F)

Then I think we should let users choose how they want to compress and
where they want their compression stream to restart. Something like
this:
a. compression_restart=query: Restart the stream after every query.
Recommended if queries across the same connection are triggered by
different end-users. I think this would be a sane default
b. compression_restart=message: Restart the stream for every message.
Recommended if the amount of correlation between rows of the same
query is a security concern.
c. compression_restart=manual: Don't restart the stream automatically,
but only when the client user calls a specific function. Recommended
only if the user can make trade-offs, or if no encryption is used
anyway.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
Next
From: Jacob Burroughs
Date:
Subject: Re: libpq compression (part 3)