Re: Multivariate MCV stats can leak data to unprivileged users - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Multivariate MCV stats can leak data to unprivileged users |
Date | |
Msg-id | 20190613173745.pjmvxq7x4ceoosck@development Whole thread Raw |
In response to | Re: Multivariate MCV stats can leak data to unprivileged users (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Multivariate MCV stats can leak data to unprivileged users
|
List | pgsql-hackers |
On Mon, Jun 10, 2019 at 02:32:04PM +0100, Dean Rasheed wrote: >On Thu, 6 Jun 2019 at 21:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> Hi, >> >> Attached are three patches tweaking the stats - two were already posted >> in this thread, the third one is just updating docs. >> >> 1) 0001 - split pg_statistic_ext into definition + data >> >> This is pretty much the patch Dean posted some time ago, rebased to >> current master (fixing just minor pgindent bitrot). >> >> 2) 0002 - update sgml docs to reflect changes from 0001 >> >> 3) 0003 - define pg_stats_ext view, similar to pg_stats >> > >Seems reasonable on a quick read-through, except I spotted a bug in >the view (my fault) -- the statistics_owner column should come from >s.stxowner rather than c.relowner. > > >> The question is whether we want to also redesign pg_statistic_ext_data >> per Tom's proposal (more about that later), but I think we can treat >> that as an additional step on top of 0001. So I propose we get those >> changes committed, and then perhaps also switch the data table to the >> EAV model. >> >> Barring objections, I'll do that early next week, after cleaning up >> those patches a bit more. >> >> One thing I think we should fix is naming of the attributes in the 0001 >> patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is >> in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should >> probably switch to "stxd" in the _data catalog. Opinions? >> > >Yes, that makes sense. Especially when joining the 2 tables, since it >makes it more obvious which table a given column is coming from in a >join clause. > OK, attached are patches fixing the issues reported by you and John Naylor, and squashing the parts into just two patches (catalog split and pg_stats_ext). Barring objections, I'll push those tomorrow. I've renamed columns in the _data catalog from 'stx' to 'stxd', which I think is appropriate given the "data" in catalog name. I'm wondering if we should change the examples in SGML docs (say, in planstats.sgml) to use the new pg_stats_ext view, instead of querying the catalogs directly. I've tried doing that, but I found the results less readable than what we currently have (especially for the MCV list, where it'd require matching elements in multiple arrays). So I've left this unchanged for now. > >> Now, back to the proposal to split the _data catalog rows to EAV form, >> with a new data type replacing the multiple types we have at the moment. >> I've started hacking on it today, but the more I work on it the less >> useful it seems to me. >> >> My understanding is that with that approach we'd replace the _data >> catalog (which currently has one column per statistic type, with a >> separate data type) with 1:M generic rows, with a generic data type. >> That is, we'd replace this >> >> CREATE TABLE pg_statistic_ext_data ( >> stxoid OID, >> stxdependencies pg_dependencies, >> stxndistinct pg_ndistinct, >> stxmcv pg_mcv_list, >> ... histograms ... >> ); >> >> with something like this: >> >> CREATE TABLE pg_statistiex_ext_data ( >> stxoid OID, >> stxkind CHAR, >> stxdata pg_statistic_ext_type >> ); >> >> where pg_statistic_ext would store all existing statistic types. along >> with a "flag" saying which value it actually stored (essentially a copy >> of the stxkind column, which we however need to lookup a statistic of a >> certain type, without having to detoast the statistic itself). >> >> As I mentioned before, I kinda dislike the fact that this obfuscates the >> actual statistic type by hiding it behing the "wrapper" type. >> >> The other thing is that we have to deal with 1:M relationship every time >> we (re)build the statistics, or when we need to access them. Now, it may >> not be a huge amount of code, but it just seems unnecessary. It would >> make sense if we planned to add large number of additional statistic >> types, but that seems unlikely - I personally can think of maybe one new >> statistic type, but that's about it. >> >> I'll continue working on it and I'll share the results early next week, >> after playing with it a bit, but I think we should get the existing >> patches committed and then continue discussing this as an additional >> improvement. >> > >I wonder ... would it be completely crazy to just use a JSON column to >store the extended stats data? > >It wouldn't be as compact as your representation, but it would allow >for future stats kinds without changing the catalog definitions, and >it wouldn't obfuscate the stats types. You could keep the 1:1 >relationship, and have top-level JSON keys for each stats kind built, >and you wouldn't need the pg_mcv_list_items() function because you >could just put the MCV data in JSON arrays, which would be much more >transparent, and would make the user-accessible view much simpler. One >could also imagine writing regression tests that checked for specific >expected MCV values like "stxdata->'mcv'->'frequency'->0". > You mean storing it as JSONB, I presume? I've actually considered that at some point, but eventually concluded it's not a good match. I mean, JSON(B) is pretty versatile and can be whacked to store pretty much anything, but it has various limitations - e.g. it does not support arbitrary data types, so we'd have to store a lot of stuff as text (through input/output functions). That doesn't seem very nice, I guess. If we want JSONB output, that should not be difficult to generate. But I guess your point was about generic storage format. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: