Re: Complete data erasure - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Complete data erasure
Date
Msg-id 20200128232456.yi5yzbpgtaz6sskf@development
Whole thread Raw
In response to Re: Complete data erasure  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Complete data erasure
List pgsql-hackers
On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
>Greetings,
>
>* asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
>> From: Stephen Frost <sfrost@snowman.net>
>> > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
>> > > This feature erases data area just before it is returned to the OS (“erase”
>> > means that overwrite data area to hide its contents here)
>> > > because there is a risk that the data will be restored by attackers if it is returned
>> > to the OS without being overwritten.
>> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
>> >
>> > Looking at this fresh, I wanted to point out that I think Tom's right-
>> > we aren't going to be able to reasonbly support this kind of data
>> > erasure on a simple DROP TABLE or TRUNCATE.
>> >
>> > > I want users to be able to customize the erasure method for their security
>> > policies.
>> >
>> > There's also this- but I think what it means is that we'd probably have
>> > a top-level command that basically is "ERASE TABLE blah;" or similar
>> > which doesn't operate during transaction commit but instead marks the
>> > table as "to be erased" and then perhaps "erasure in progress" and then
>> > "fully erased" (or maybe just back to 'normal' at that point).  Making
>> > those updates will require the command to perform its own transaction
>> > management which is why it can't be in a transaction itself but also
>> > means that the data erasure process doesn't need to be done during
>> > commit.
>> >
>> > > My idea is adding a new parameter erase_command to postgresql.conf.
>> >
>> > Yeah, I don't think that's really a sensible option or even approach.
>>
>> I think erase_command can also manage the state of a table.
>> The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress")
>> erase_command is executed not during a transaction but when unlink() is executed.
>
>I really don't see what the advantage of having this be configurable is.
>In addition, an external command's actions wouldn't be put through the
>WAL meaning that replicas would have to be dealt with in some other way
>beyind regular WAL and that seems like it'd just be ugly.
>
>> (for example, after a transaction that has done DROP TABLE)
>
>We certainly can't run external commands during transaction COMMIT, so
>this can't be part of a regular DROP TABLE.
>

IMO the best solution would be that the DROP TABLE does everything as
usual, but instead of deleting the relfilenode it moves it to some sort
of queue. And then a background worker would "erase" these relfilenodes
outside the COMMIT.

And yes, we need to do this in a way that works with replicas, i.e. we
need to WAL-log it somehow. And it should to be done in a way that works
when the replica is on a different type of filesystem.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: "Robert Willis"
Date:
Subject: psqlODBC development
Next
From: Mark Dilger
Date:
Subject: Hash join not finding which collation to use for string hashing