On 12 December 2012 19:13, Jan Wieck <JanWieck@yahoo.com> wrote:
> On 12/12/2012 1:12 PM, Simon Riggs wrote:
>>
>> Currently, ANALYZE collects data on all columns and stores these
>> samples in pg_statistic where they can be seen via the view pg_stats.
>>
>> In some cases we have data that is private and we do not wish others
>> to see it, such as patient names. This becomes more important when we
>> have row security.
>>
>> Perhaps that data can be protected, but it would be even better if we
>> simply didn't store value-revealing statistic data at all. Such
>> private data is seldom the target of searches, or if it is, it is
>> mostly evenly distributed anyway.
>
>
> Would protecting it the same way, we protect the passwords in pg_authid, be
> sufficient?
The user backend does need to be able to access the stats data during
optimization. It's hard to have data accessible and yet impose limits
on the uses to which that can be put. If we have row security on the
table but no equivalent capability on the stats, then we'll have
leakage. e.g. set statistics 10000, ANALYZE, then leak 10000 credit
card numbers.
Selectivity functions are not marked leakproof, nor do people think
they can easily be made so. Which means the data might be leaked by
various means through error messages, plan selection, skullduggery
etc..
If it ain't in the bucket, the bucket can't leak it.
-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services