PRIVATE columns - Mailing list pgsql-hackers

From Simon Riggs
Subject PRIVATE columns
Date
Msg-id CA+U5nMJtFsNdm7fp=s2w07nSFSRKt9yrXmU=g040fOP8pDpEiQ@mail.gmail.com
Whole thread Raw
Responses Re: PRIVATE columns
Re: PRIVATE columns
List pgsql-hackers
Currently, ANALYZE collects data on all columns and stores these
samples in pg_statistic where they can be seen via the view pg_stats.

In some cases we have data that is private and we do not wish others
to see it, such as patient names. This becomes more important when we
have row security.

Perhaps that data can be protected, but it would be even better if we
simply didn't store value-revealing statistic data at all. Such
private data is seldom the target of searches, or if it is, it is
mostly evenly distributed anyway.

It would be good if we could collect the overall stats
* NULL fraction
* average width
* ndistinct
yet without storing either the MFVs or histogram.
Doing that would avoid inadvertent leaking of potentially private information.

SET STATISTICS 0
simply skips collection of statistics altogether

To implement this, one way would be to allow

ALTER TABLE foo ALTER COLUMN foo1 SET STATISTICS PRIVATE;

Or we could use another magic value like -2 to request this case.

(Yes, I am aware we could use a custom datatype with a custom
typanalyze for this, but that breaks other things)

Thoughts?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: [PATCH] PL/Python: Add spidata to all spiexceptions
Next
From: Jan Wieck
Date:
Subject: Re: PRIVATE columns