Re: Complete data erasure - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Complete data erasure
Date
Msg-id 20200411182017.5hn3lgxnhf45awzs@development
Whole thread Raw
In response to Re: Complete data erasure  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Apr 11, 2020 at 01:56:10PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I don't think "commit is atomic" really implies "data should be released
>> at commit". This is precisely what makes the feature extremely hard to
>> implement, IMHO.
>
>> Why wouldn't it be acceptable to do something like this?
>
>>      BEGIN;
>>      ...
>>      DROP TABLE x ERASE;
>>      ...
>>      COMMIT;  <-- Don't do data erasure, just add "x" to queue.
>
>>      -- wait for another process to complete the erasure
>>      SELECT pg_wait_for_erasure();
>
>> That means we're not running any custom commands / code during commit,
>> which should (hopefully) make it easier to handle errors.
>
>Yeah, adding actions-that-could-fail to commit is a very hard sell,
>so something like this API would probably have a better chance.
>
>However ... the whole concept of erasure being a committable action
>seems basically misguided from here.  Consider this scenario:
>
>    begin;
>
>    create table full_o_secrets (...);
>
>    ... manipulate secret data in full_o_secrets ...
>
>    drop table full_o_secrets erase;
>
>    ... do something that unintentionally fails, causing xact abort ...
>
>    commit;
>
>Now what?  Your secret data is all over the disk and you have *no*
>recourse to get rid of it; that's true even at a very low level,
>because we unlinked the file when rolling back the transaction.
>If the error occurred before getting to "drop table full_o_secrets
>erase" then there isn't even any way in principle for the server
>to know that you might not be happy about leaving that data lying
>around.
>
>And I haven't even spoken of copies that may exist in WAL, or
>have been propagated to standby servers by now.
>
>I have no idea what an actual solution that accounted for those
>problems would look like.  But as presented, this is a toy feature
>offering no real security gain, if you ask me.
>

Yeah, unfortunately the feature as proposed has these weaknesses.

This is why I proposed that a solution based on encryption and throwing
away a key might be more reliable - if you don't have a key, who cares
if the encrypted data file (or parts of it) is still on disk?

It has issues too, though - a query might need a temporary file to do a
sort, hash join spills to disk, or something like that. And those won't
be encrypted without some executor changes (e.g. we might propagate
"needs erasure" to temp files, and do erasure when necessary).

I doubt a perfect solution would be so complex it's not feasible in
practice, especially in v1. So maybe the best thing we can do is
documenting those limitations, but I'm not sure where to draw the line
between acceptable and unacceptable limitations.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: execExprInterp() questions / How to improve scalar array op expreval?
Next
From: Neil
Date:
Subject: Re: Support for DATETIMEOFFSET