Re: libpq compression (part 3) - Mailing list pgsql-hackers
From | Jacob Champion |
---|---|
Subject | Re: libpq compression (part 3) |
Date | |
Msg-id | CAOYmi+=mkyjezFRcHnPonv75fSK4KswXv7p0pPy1Hva0-aQOAw@mail.gmail.com Whole thread Raw |
In response to | Re: libpq compression (part 3) (Jacob Burroughs <jburroughs@instructure.com>) |
Responses |
Re: libpq compression (part 3)
|
List | pgsql-hackers |
On Tue, May 21, 2024 at 8:23 AM Jacob Burroughs <jburroughs@instructure.com> wrote: > As currently implemented, the compression only applies to > CopyData/DataRow/Query messages, none of which should be involved in > authentication, unless I've really missed something in my > understanding. Right, but Robert has argued that we should compress it all, and I'm responding to that proposal. Sorry for introducing threads within threads. But I think it's valuable to pin down both 1) the desired behavior, and 2) how the current proposal behaves, as two separate things. I'll try to do a better job of communicating which I'm talking about. > > Right, I think it's reasonable to let a sufficiently > > determined/informed user lift the guardrails, but first we have to > > choose to put guardrails in place... and then we have to somehow > > sufficiently inform the users when it's okay to lift them. > > My thought would be that compression should be opt-in on the client > side, with documentation around the potential security pitfalls. (I > could be convinced it should be opt-in on the server side, but overall > I think opt-in on the client side generally protects against footguns > without excessively getting in the way We absolutely have to document the risks and allow clients to be written safely. But I think server-side controls on risky behavior have proven to be generally more valuable, because the server administrator is often in a better spot to see the overall risks to the system. ("No, you will not use deprecated ciphersuites. No, you will not access this URL over plaintext. No, I will not compress this response containing customer credit card numbers, no matter how nicely you ask.") There are many more clients than servers, so it's less risky for the server to enforce safety than to hope that every client is safe. Does your database and access pattern regularly mingle secrets with public data? Would auditing correct client use of compression be a logistical nightmare? Do your app developers keep indicating in conversations that they don't understand the risks at all? Cool, just set `encrypted_compression = nope_nope_nope` on the server and sleep soundly at night. (Ideally we would default to that.) > and if an attacker controls the > client, they can just get the information they want directly-they > don't need compression sidechannels to get that information.) Sure, but I don't think that's relevant to the threats being discussed. > Within SQL-level things, I don't think we can reasonably differentiate > between private and attacker-controlled information at the > libpq/server level. And by the IETF line of argument -- or at least the argument I quoted above -- that implies that we really have no business introducing compression when confidentiality is requested. A stronger approach would require us to prove, or the user to indicate, safety before compressing. Take a look at the security notes for QPACK [1] -- keeping in mind that they know _more_ about what's going on at the protocol level than we do, due to the header design. And they still say things like "an encoder might choose not to index values with low entropy" and "these criteria ... will evolve over time as new attacks are discovered." A huge amount is left as an exercise for the reader. This stuff is really hard. > We can reasonably differentiate between message > types that *definitely* are private and ones that could have > either/both data in them, but that's not nearly as useful. I think > not compressing auth-related packets plus giving a mechanism to reset > the compression stream for clients (plus guidance on the tradeoffs > involved in turning on compression) is about as good as we can get. The concept of stream reset seems necessary but insufficient at the application level, which bleeds over into Jelte's compression_restart proposal. (At the protocol level, I think it may be sufficient?) If I write a query where one of the WHERE clauses is attacker-controlled and the other is a secret, I would really like to not compress that query on the client side. If I join a table of user IDs against a table of user-provided addresses and a table of application tokens for that user, compressing even a single row leaks information about those tokens -- at a _very_ granular level -- and I would really like the server not to do that. So if I'm building sand castles... I think maybe it'd be nice to mark tables (and/or individual columns?) as safe for compression under encryption, whether by row or in aggregate. And maybe libpq and psql should be able to turn outgoing compression on and off at will. And I understand those would balloon the scope of the feature. I'm worried I'm doing the security-person thing and sucking all the air out of the room. I know not everybody uses transport encryption; for those people, compress-it-all is probably a pretty winning strategy, and there's no need to reset the compression context ever. And the pg_dump-style, "give me everything" use case seems like it could maybe be okay, but I really don't know how to assess the risk there, at all. > That said, I *think* the feature is reasonable to be > reviewed/committed without the reset functionality as long as the > compressed data already has the mechanism built in (as it does) to > signal when a decompressor should restart its streaming. The actual > signaling protocol mechanism/necessary libpq API can happen in > followon work. Well... working out the security minutiae _after_ changing the protocol is not historically a winning strategy, I think. Better to do it as a vertical stack. Thanks, --Jacob [1] https://www.rfc-editor.org/rfc/rfc9204.html#name-security-considerations
pgsql-hackers by date: