Re: libpq compression - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: libpq compression
Date
Msg-id CA+q6zcUPrssNaRS+FyoBsD-F2stK1Roa-4sAhFOfAjOWLziM4g@mail.gmail.com
Whole thread Raw
In response to Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: libpq compression
List pgsql-hackers
On Mon, Feb 11, 2019 at 4:52 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-02-11 12:46:07 -0300, Alvaro Herrera wrote:
> > On 2019-Feb-11, Andreas Karlsson wrote:
> >
> > > Imagine the following query which uses the session ID from the cookie to
> > > check if the logged in user has access to a file.
> > >
> > > SELECT may_download_file(session_id => $1, path => $2);
> > >
> > > When the query with its parameters is compressed the compressed size will
> > > depend on the similarity between the session ID and the requested path
> > > (assuming Zstd works similar to DEFLATE), so by tricking the web browser
> > > into making requests with specifically crafted paths while monitoring the
> > > traffic between the web server and the database the compressed request size
> > > can be use to hone in the session ID and steal people's login sessions, just
> > > like the CRIME attack[1].
> >
> > I would have said that you'd never let the attacker eavesdrop into the
> > traffic between webserver and DB, but then that's precisely the scenario
> > you'd use SSL for, so I suppose that even though this attack is probably
> > just a theoretical risk at this point, it should definitely be
> > considered.
>
> Right.
>
>
> > Now, does this mean that we should forbid libpq compression
> > completely?  I'm not sure -- maybe it's still usable if you encapsulate
> > that traffic so that the attackers cannot get at it, and there's no
> > reason to deprive those users of the usefulness of the feature.  But
> > then we need documentation warnings pointing out the risks.
>
> I think it's an extremely useful feature, and can be used in situation
> where this kind of attack doesn't pose a significant
> danger. E.g. pg_dump, pg_basebackup seem candidates for that, and even
> streaming replication seems much less endangered than sessions with lots
> of very small queries.  But I think that means it needs to be
> controllable from both the server and client, and default to off
> (although it doesn't seem crazy to allow it in the aforementioned
> cases).

Totally agree with this point.

On Thu, Nov 29, 2018 at 8:13 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Mon, Aug 13, 2018 at 8:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > I agree with the critiques from Robbie Harwood and Michael Paquier
> > that the way in that compression is being hooked into the existing
> > architecture looks like a kludge.  I'm not sure I know exactly how it
> > should be done, but the current approach doesn't look natural; it
> > looks like it was bolted on.
>
> After some time spend reading this patch and investigating different points,
> mentioned in the discussion, I tend to agree with that. As far as I see it's
> probably the biggest disagreement here, that keeps things from progressing.

To address this point, I've spent some time playing with the patch and to see
how it would look like if compression logic would not incorporate everything
else. So far I've come up with a buffers juggling that you can find in the
attachments. It's based on the v11 (I'm not able to keep up with Konstantin),
doesn't change all relevant code (mostly stuff around secure read/write), and
most likely misses a lot of details, but at least works as expected
while testing
manually with psql (looks like tests are broken with the original v11, so I
haven't tried to run them with the attached patch either). I wonder if this
approach could be more natural for the feature?

Also I have few questions/commentaries:

* in this discussion few times was mentioned a situation when compression logic
  nees to read more data to decompress the current buffer. Am I understand
  correctly, that it's related to Z_BUF_ERROR, when zlib couldn't make any
  progress and needs more data? If yes, where is it handled?

* one of the points was that the code should be copy pasted between be/fe for a
  proper separation, but from attached experimental patch it doesn't look that
  a lot of code should be duplicated.

* I've noticed on v11 a problem, when an attempt to connect to a db without
  specified compression failed with the error "server is not supported
  requested compression algorithm". E.g. when I'm just executing createdb.

I haven't looked at the negotiation logic yet, hope to do that soon.

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Next
From: Tom Lane
Date:
Subject: Re: reducing isolation tests runtime