Re: Multivariate MCV stats can leak data to unprivileged users - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Multivariate MCV stats can leak data to unprivileged users
Date
Msg-id 20190520181714.fm4a4yjtgffh6kst@development
Whole thread Raw
In response to Re: Multivariate MCV stats can leak data to unprivileged users  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Multivariate MCV stats can leak data to unprivileged users  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Mon, May 20, 2019 at 04:09:24PM +0100, Dean Rasheed wrote:
>On Mon, 20 May 2019 at 14:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> > On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >> Oh, right. It still has the disadvantage that it obfuscates the actual
>> >> data stored in the pg_stats_ext_data (or whatever would it be called),
>> >> so e.g. functions would have to do additional checks to make sure it
>> >> actually is the right statistic type. For example pg_mcv_list_items()
>> >> could not rely on receiving pg_mcv_list values, as per the signature,
>> >> but would have to check the value.
>>
>> > Yes. In fact, since the user-accessible view would want to expose
>> > datatypes specific to the stats kinds rather than bytea or cstring
>> > values, we would need SQL-callable conversion functions for each kind:
>>
>> It seems like people are willfully misunderstanding my suggestion.
>
>I'm more than capable of inadvertently misunderstanding, without the
>need to willfully do so :-)
>
>> You'd only need *one* conversion function, which would look at the
>> embedded ID field and then emit the appropriate text representation.
>> I don't see a reason why we'd have the separate pg_ndistinct etc. types
>> any more at all.
>
>Hmm, OK. So then would you also make the user-accessible view agnostic
>about the kinds of stats supported in the same way, returning zero or
>more rows per STATISTICS object, depending on how many kinds of stats
>have been built? That would have the advantage of never needing to
>change the view definition again, as more stats kinds are supported.
>

If I got Tom's proposal right, there'd be only one statistic value in
each pg_stats_ext_data value. It'd be a very thin wrapper, essentially
just the value itself + type flag. So for example if you did

  CREATE STATISTICS s (ndistinct, mcv, dependencies) ...

you'd get three rows in pg_statistic_ext_data (assuming all the stats
get actually built).

>We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
>value rather than a pg_mcv value, and it would be the user's
>responsibility to call it if they wanted to see the contents of the
>MCV list (I was originally thinking that we'd include a call to
>pg_mcv_list_items() in the view definition, so that it produced
>friendlier looking output, since the default textual representation of
>an MCV list is completely opaque, unlike the other stats kinds).
>Actually, I can see another advantage to not including
>pg_mcv_list_items() in the view definition -- in the future, we may
>dream up a better version of pg_mcv_list_items(), like say one that
>produced JSON, and then we'd regret using the current function.
>

Yeah. As I said, it obfuscates the "actual" type of the stats value, so
we can no longer rely on the function machinery to verify the type. All
functions dealing with the "wrapper" type would have to verify it
actually contains the right statistic type.

>> Anyway, it was just a suggestion, and if people don't like it that's
>> fine.  But I don't want it to be rejected on the basis of false
>> arguments.
>
>To be clear, I'm not intentionally rejecting your idea. I'm merely
>trying to fully understand the implications.
>
>At this stage, perhaps it would be helpful to prototype something for
>comparison.
>

I'll look into that. I'll try to whip something up before pgcon, but I
can't guarantee that :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Jeremy Finzel
Date:
Subject: Question about new pg12 feature no-rewrite timestamp to timestamptz conversion
Next
From: Justin Pryzby
Date:
Subject: Re: clean up docs for v12