Re: Multivariate MCV stats can leak data to unprivileged users - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Multivariate MCV stats can leak data to unprivileged users
Date
Msg-id CAEZATCXtSSJmZGNiP8N1=hyHhrE3vM6w53qTpoiJgy6Jtai_3g@mail.gmail.com
Whole thread Raw
In response to Re: Multivariate MCV stats can leak data to unprivileged users  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Multivariate MCV stats can leak data to unprivileged users  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Mon, 20 May 2019 at 14:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >> Oh, right. It still has the disadvantage that it obfuscates the actual
> >> data stored in the pg_stats_ext_data (or whatever would it be called),
> >> so e.g. functions would have to do additional checks to make sure it
> >> actually is the right statistic type. For example pg_mcv_list_items()
> >> could not rely on receiving pg_mcv_list values, as per the signature,
> >> but would have to check the value.
>
> > Yes. In fact, since the user-accessible view would want to expose
> > datatypes specific to the stats kinds rather than bytea or cstring
> > values, we would need SQL-callable conversion functions for each kind:
>
> It seems like people are willfully misunderstanding my suggestion.

I'm more than capable of inadvertently misunderstanding, without the
need to willfully do so :-)

> You'd only need *one* conversion function, which would look at the
> embedded ID field and then emit the appropriate text representation.
> I don't see a reason why we'd have the separate pg_ndistinct etc. types
> any more at all.

Hmm, OK. So then would you also make the user-accessible view agnostic
about the kinds of stats supported in the same way, returning zero or
more rows per STATISTICS object, depending on how many kinds of stats
have been built? That would have the advantage of never needing to
change the view definition again, as more stats kinds are supported.

We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
value rather than a pg_mcv value, and it would be the user's
responsibility to call it if they wanted to see the contents of the
MCV list (I was originally thinking that we'd include a call to
pg_mcv_list_items() in the view definition, so that it produced
friendlier looking output, since the default textual representation of
an MCV list is completely opaque, unlike the other stats kinds).
Actually, I can see another advantage to not including
pg_mcv_list_items() in the view definition -- in the future, we may
dream up a better version of pg_mcv_list_items(), like say one that
produced JSON, and then we'd regret using the current function.

> Anyway, it was just a suggestion, and if people don't like it that's
> fine.  But I don't want it to be rejected on the basis of false
> arguments.

To be clear, I'm not intentionally rejecting your idea. I'm merely
trying to fully understand the implications.

At this stage, perhaps it would be helpful to prototype something for
comparison.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Next
From: Andres Freund
Date:
Subject: Re: Statistical aggregate functions are not working with PARTIAL aggregation