On 12/12/2012 3:12 PM, Simon Riggs wrote:
> On 12 December 2012 19:13, Jan Wieck <JanWieck@yahoo.com> wrote:
>> On 12/12/2012 1:12 PM, Simon Riggs wrote:
>>>
>>> Currently, ANALYZE collects data on all columns and stores these
>>> samples in pg_statistic where they can be seen via the view pg_stats.
>>>
>>> In some cases we have data that is private and we do not wish others
>>> to see it, such as patient names. This becomes more important when we
>>> have row security.
>>>
>>> Perhaps that data can be protected, but it would be even better if we
>>> simply didn't store value-revealing statistic data at all. Such
>>> private data is seldom the target of searches, or if it is, it is
>>> mostly evenly distributed anyway.
>>
>>
>> Would protecting it the same way, we protect the passwords in pg_authid, be
>> sufficient?
>
> The user backend does need to be able to access the stats data during
> optimization. It's hard to have data accessible and yet impose limits
> on the uses to which that can be put. If we have row security on the
> table but no equivalent capability on the stats, then we'll have
> leakage. e.g. set statistics 10000, ANALYZE, then leak 10000 credit
> card numbers.
Like for the encrypted password column of pg_authid, I don't see any
reason why the values in the stats columns need to be readable for
anyone but a superuser at all. Do you?
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin