Re: PRIVATE columns - Mailing list pgsql-hackers

From Tom Lane
Subject Re: PRIVATE columns
Date
Msg-id 14642.1355345874@sss.pgh.pa.us
Whole thread Raw
In response to PRIVATE columns  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: PRIVATE columns  (Simon Riggs <simon@2ndQuadrant.com>)
Re: PRIVATE columns  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
List pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes:
> Currently, ANALYZE collects data on all columns and stores these
> samples in pg_statistic where they can be seen via the view pg_stats.

Only if you have appropriate privileges.

> In some cases we have data that is private and we do not wish others
> to see it, such as patient names. This becomes more important when we
> have row security.

> Perhaps that data can be protected, but it would be even better if we
> simply didn't store value-revealing statistic data at all.

SET STATISTICS 0 seems like a sufficient solution for people who don't
trust the have_column_privilege() protection in the pg_stats view.

In practice I think this is a waste of time, though.  Anyone who can
bypass the view restriction can probably just read the original table.

(I suppose we could consider marking pg_stats as a security_barrier
view to make this even safer.  Not sure it's worth the trouble though;
the interesting columns are anyarray so it's hard to do much with them
mechanically.)

> It would be good if we could collect the overall stats
> * NULL fraction
> * average width
> * ndistinct
> yet without storing either the MFVs or histogram.

Do you have any evidence whatsoever that that's worth the trouble?
I'd bet against it.  And if we're being paranoid, who's to say that
those numbers couldn't reveal useful data in themselves?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Use of systable_beginscan_ordered in event trigger patch
Next
From: Dimitri Fontaine
Date:
Subject: Re: Event Triggers: adding information