Re: PRIVATE columns - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: PRIVATE columns
Date
Msg-id 50C8D740.5000001@Yahoo.com
Whole thread Raw
In response to PRIVATE columns  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: PRIVATE columns
List pgsql-hackers
On 12/12/2012 1:12 PM, Simon Riggs wrote:
> Currently, ANALYZE collects data on all columns and stores these
> samples in pg_statistic where they can be seen via the view pg_stats.
>
> In some cases we have data that is private and we do not wish others
> to see it, such as patient names. This becomes more important when we
> have row security.
>
> Perhaps that data can be protected, but it would be even better if we
> simply didn't store value-revealing statistic data at all. Such
> private data is seldom the target of searches, or if it is, it is
> mostly evenly distributed anyway.

Would protecting it the same way, we protect the passwords in pg_authid, 
be sufficient?


Jan

>
> It would be good if we could collect the overall stats
> * NULL fraction
> * average width
> * ndistinct
> yet without storing either the MFVs or histogram.
> Doing that would avoid inadvertent leaking of potentially private information.
>
> SET STATISTICS 0
> simply skips collection of statistics altogether
>
> To implement this, one way would be to allow
>
> ALTER TABLE foo
>    ALTER COLUMN foo1 SET STATISTICS PRIVATE;
>
> Or we could use another magic value like -2 to request this case.
>
> (Yes, I am aware we could use a custom datatype with a custom
> typanalyze for this, but that breaks other things)
>
> Thoughts?
>


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: PRIVATE columns
Next
From: "Karl O. Pinc"
Date:
Subject: Re: [PATCH] PL/Python: Add spidata to all spiexceptions