Re: Multivariate MCV stats can leak data to unprivileged users - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Multivariate MCV stats can leak data to unprivileged users
Date
Msg-id 20190520144517.kt2m5lk3em7sfkxu@development
Whole thread Raw
In response to Re: Multivariate MCV stats can leak data to unprivileged users  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, May 20, 2019 at 09:32:24AM -0400, Tom Lane wrote:
>Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>> Oh, right. It still has the disadvantage that it obfuscates the actual
>>> data stored in the pg_stats_ext_data (or whatever would it be called),
>>> so e.g. functions would have to do additional checks to make sure it
>>> actually is the right statistic type. For example pg_mcv_list_items()
>>> could not rely on receiving pg_mcv_list values, as per the signature,
>>> but would have to check the value.
>
>> Yes. In fact, since the user-accessible view would want to expose
>> datatypes specific to the stats kinds rather than bytea or cstring
>> values, we would need SQL-callable conversion functions for each kind:
>
>It seems like people are willfully misunderstanding my suggestion.
>You'd only need *one* conversion function, which would look at the
>embedded ID field and then emit the appropriate text representation.
>I don't see a reason why we'd have the separate pg_ndistinct etc. types
>any more at all.
>

That would however require having input functions, which we currently
don't have. Otherwise people could not process the statistic values using
functions like pg_mcv_list_items(). Which I think is useful.

Of course, we could add input functions, but there was a reasoning for not
having them (similarly to pg_node_tree). 

>> Also this model presupposes that all future stats kinds are most
>> conveniently represented in a single column, but maybe that won't be
>> the case. It's conceivable that a future stats kind would benefit from
>> splitting its data across multiple columns.
>
>Hm, that's possible I suppose, but it seems a little far-fetched.
>You could equally well argue that pg_ndistinct etc. should have been
>broken down into smaller types, but we didn't.
>

True. I can't rule out adding such "split" statistic type, but don't think
it's very likely. The extended statistic values tend to be complex and
easier to represent in a single value.

>> Yes, I think it is an EAV model. I think EAV models do have their
>> place, but I think that's largely where adding new columns is a common
>> operation and involves adding little to no extra code. I don't think
>> either of those is true for extended stats. What we've seen over the
>> last couple of years is that adding each new stats kind is a large
>> undertaking, involving lots of new code. That alone is going to limit
>> just how many ever get added, and compared to that effort, adding new
>> columns to the catalog is small fry.
>
>I can't argue with that --- the make-work is just a small part of the
>total.  But it's still make-work.
>
>Anyway, it was just a suggestion, and if people don't like it that's
>fine.  But I don't want it to be rejected on the basis of false
>arguments.
>

Sure.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weirdstats
Next
From: Dean Rasheed
Date:
Subject: Re: Multivariate MCV stats can leak data to unprivileged users