Re: Complete data erasure - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Complete data erasure
Date
Msg-id 20200203173038.wawcyw2rzmihj45z@development
Whole thread Raw
In response to Re: Complete data erasure  (Stephen Frost <sfrost@snowman.net>)
Responses RE: Complete data erasure
List pgsql-hackers
On Mon, Feb 03, 2020 at 09:07:09AM -0500, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
>> >We certainly can't run external commands during transaction COMMIT, so
>> >this can't be part of a regular DROP TABLE.
>>
>> IMO the best solution would be that the DROP TABLE does everything as
>> usual, but instead of deleting the relfilenode it moves it to some sort
>> of queue. And then a background worker would "erase" these relfilenodes
>> outside the COMMIT.
>
>That sounds interesting, though I'm a bit worried that it's going to
>lead to the same kind of complications and difficulty that we have with
>deleted columns- anything that's working with the system tables will
>need to see this new "dropped but pending delete" flag.  Would we also
>rename the table when this happens?  Or change the schema it's in?
>Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking.
>

That's not really what I meant - let me explain. When I said DROP TABLE
should do everything as usual, that includes catalog changes. I.e. after
the commit there would not be any remaining entries in system catalogs
or anything like that.

The only thing we'd do differently is that instead of unlinking the
relfilenode segments, we'd move the relfilenode to a persistent queue
(essentially a regular table used as a queue relfilenodes). The
background worker would watch the queue, and when it gets a new
relfilenode it'd "delete" the data and then remove the relfilenode from
the queue.

So essentially others would not be able to even see the (now dropped)
object, they could create new object with the same name etc.

I imagine we might provide a way to wait for the deletion to actually
complete (can't do that as part of the DROP TABLE, though), so that
people can be sure when the data is actually gone (for scripts etc.).
A simple function waiting for the queue to get empty might be enough, I
guess, but maybe not.

>> And yes, we need to do this in a way that works with replicas, i.e. we
>> need to WAL-log it somehow. And it should to be done in a way that works
>> when the replica is on a different type of filesystem.
>
>I agree it should go through WAL somehow (ideally without needing an
>actual zero'd or whatever page for every page in the relation), but why
>do we care about the filesystem on the replica?  We don't have anything
>that's really filesystem specific in WAL replay today and I don't see
>this as needing to change that..
>

I think this depends on what our requirements are.

My assumption is that when you perform this "secure data erasure" on the
primary, you probably also want to erase the data on the replica. But if
the instances use different file systems (COW vs. non-COW, ...) the
exact thing that needs to happen may be different. Or maybe the replica
does not need to do anything, making it noop?

In which case we don't need to WAL-log the exact change for each page,
it might even be fine to not even WAL-log anything except for the final
removal from the queue. I mean, the data is worthless and not used by
anyone at this point, there's no point in replicating it ...

I haven't thought about this very hard. It's not clear what should
happen if we complete the erasure on primary, remove the relfilenode
from the queue, and then restart the replica before it finishes the
local erasure. The queue (if represented by a simple table) will be
replicated, so the replica will forget it still has work to do.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PoC] Non-volatile WAL buffer
Next
From: Mark Dilger
Date:
Subject: Re: Portal->commandTag as an enum